Solomon Hykes on Docker, Dagger, and the Future of DevOps

In this episode, Solomon Hykes discusses the journey from Docker's inception to its widespread adoption, the challenges faced in open-source development, and his current work with Dagger. He explains how Dagger aims to revolutionize continuous integration by making pipelines more modular and efficient, addressing the "push and pray" problem in software development. Hykes also shares insights on the evolution of DevOps, the complexities of open-source business models, and his vision for the future of software development workflows.

Guest: Solomon Hykes, Co-founder of Docker and Dagger

Solomon Hykes is the co-founder and CEO of Dagger.io, the first programmable CI/CD engine. Before that, he was the co-founder of Docker, where he served for 10 years as CEO then CTO, and a founding member of the CNCF Technical Oversight Committee. Solomon grew up in France, and now lives in San Francisco.

Solomon Hykes - Twitter

“The Future of Linux Containers”

Links to interesting things from this episode:

Transcript

Intro: 00:04

You're listening to the Platform Engineering Podcast, your expert guide to the fascinating world of platform engineering.

Each episode brings you in depth interviews with industry experts and professionals who break down the intricacies of platform architecture, cloud operations and DevOps practices.

From tool reviews to valuable lessons from real world projects, to insights about the best approaches and strategies, you can count on this show to provide you with expert knowledge that will truly elevate your own journey in the world of platform engineering.

Cory: 00:40

Thanks for tuning into this episode of the Platform Engineering Podcast. I'm your host, Cory O'Daniel, and today I have with me Solomon Hykes, who should need absolutely no introduction whatsoever, but I'm here to make it.

‍Solomon is the creator of Docker, also founder of Dagger.io, and I'd say that personally, I've gotten to talk to a lot of people on this show that have changed my life and the way that I write software, but none so much as much as you've kind of changed our industry. So I'm super excited to have you on the show today. Thanks so much for giving me the time.

Solomon: 01:16

Thank you. Well, that was quite an introduction. Thanks for having me.

C‍ory: 01:19

Sorry about that, I didn't give you a heads-up.

Solomon: 01:21

Oh no, It’s great.

C‍ory: 01:24

I went back and watched your... I've actually watched that early demo at PyCon a fair number of times. Was that your first actual demo of Docker to the public?

Solomon: 01:37

The first public demo, yeah.

What happened is we like to build fast and show it as quickly as possible and then iterate. That's kind of our whole thing. We had started doing that in private, so we would just travel to people's offices (back when people had offices) and then show them.

Honestly, the first demos we gave, I think it was fake. It was a mock-up of a UI. If you go back in the Git history of Docker (now the Mobi repo), you'll find a file called fake.go. And that's where we developed a real-looking UI because we hadn't really finished hooking up the internals to running actual containers. We just did that in private a bunch of times.

Then eventually we thought, let's go give this demo to maybe 30 people at a time. Let's go to this conference, PyCon, that we were going to anyway because our main product was a deployment product for web applications and Python was a big target. Python developers were our typical customers, so we were going to go there anyway.

We just got a lightning talk and I assumed it would be like a little room in the back. It was about Linux containers. We thought, how many people at PyCon will care about a lightning talk about Linux containers?

It turns out PyCon does lightning talks on the main stage. That's kind of part of the thing. And I did not know that. So I ended up on the main stage, with way more people watching than I thought, and then gave that demo. And more people were interested than we expected. And that just kind of got the ball rolling.

C‍ory: 03:10

What always blew my mind about watching that is it is a lightning talk and like I was like, I’ve got to go find like the first demo of this. Like I assumed it was going to be like a 30-minute talk, a 45-minute talk and I think you have it like shimmed down to like five minutes, right? And I don't think like the guy that's emceeing, I don't think he got how big it was, because he's just like rushing you off stage and you're like, okay, I guess I’ve got to go now.

Solomon: 03:32

Well, he had a job to do.

C‍ory: 03:34

That must have been surreal. Did it feel like the audience got the gravity of what you were creating? Like did they understand how big the impact was? Or was it still…

Solomon: 03:46

I mean, we had no idea how big the impact would be. We thought it was big. We had a gut feeling that it was big from a technical design point of view and the possibilities were huge. So we were very excited. We felt like we were onto something. But as to how excited the rest of the industry would get, we had no idea.

We had been pursuing this container thing for a long time. And so we were just sort of really excited about it.

Definitely at some point I got more applause and reaction than I expected and I could tell from the vibe in the room that people were into it. That's the extent to which we realized, people are liking this. And then people came to our boot.

The marketing team for dotCloud (which was the name of our company at the time before we pivoted to doing just Docker) had a booth set up and they were there to talk about the benefits of dotCloud to deploy and host your Python applications. It was like a Heroku competitor, except under the hood we used this container stuff. And so they had no idea. People just lined up at their booth and they were like, great, wow. And then people were like, can we talk about Docker? “Oh, you mean the toy thing that we just… the lightning talk? Okay, let me get Solomon.”

So yeah, everyone was surprised.

C‍ory: 05:11

It's such a ubiquitous part of the industry. I mean, local development… we do locally and all of our stuff just stands up in Docker. We don't have a README full of 100 instructions to copy and paste to get somebody started day one. Like, it's ubiquitous, it's everywhere.

‍I feel like developers love it or hate it. Even if you're not running Docker, the products you're using are, right?

‍Looking back, how does that feel? You can literally look at AWS, GCP, Azure… the biggest companies on the planet are running and utilizing your software. What does that feel like as a developer?

Solomon: 05:49

Yeah, I mean, it's obviously really cool and it's also very abstract, even while it was happening at Docker. I spent 10 years on that company and then I left because I needed a break. But the first five years we were crossing the desert, like no one cared. We were doing containers and we were nobody. Nobody cared.

We didn't invent the technology of Linux containers, but we had a lot of original ideas on how that could be used and no one cared about that. We kept doing it anyway because we didn't have anything better to do. And then, almost overnight, this demo gets people's attention.

I guess the market was in the right place, the conditions were right. Then it becomes ridiculous. It's like too much interest all at once. And then it was five more years of just keeping up with that.

As that happened, just keeping up with insane interest in adoption and also competition and like friction and just trying to figure out, “Okay, now this is serious. This impacts people's lives. How do we make decisions, and keep improving it, and balance new features versus stability?” Like a million things to worry about that don't matter if no one's using your product.

But inside the building, it was weird because it doesn't really change that much your day-to-day. You're worrying about bugs. You're talking to users. You're debating which feature you're going to do next. Then you're bikeshedding over an API or something. And then you're doing all the same stuff. And you're super busy, but you were already busy before. So we had this weird disconnect.

And then we would go to DockerCon (the big Docker conference), where everyone went to talk about containers, and it was just a really special event. Then it would be almost like a spiritual experience, especially for new employees. Their first DockerCon was crazy because… you spend the whole year talking about problems and how to fix them because that's our job, right? We're always at the point where things don't work and then we're just trying to fix that. And we kind of forget that behind us there's a long trail of hopefully happy implementations of Docker so far. And then you get the reminder, like people just come and say, “Oh man, I'm using Docker for this, it made things so much easier.”

So that was really nice. But that's the most concrete signal you're going to get that you're having an impact. Like people come to you and say it's awesome. But you can only talk to a small number of them at a time.

C‍ory: 08:27

It's like the perfect founder's dream happening, right?

Solomon: 08:31

In some ways.

Solomon‍: 08:35

So I got a question for you then. I feel like the more cloud-native you are by default, like if you're getting your business started and you're running on AWS, Docker and containers are ubiquitous. But at the same time, there are still so many organizations that are still running significant workloads on VMs.

‍It's surprising to me. I meet customers all the time, where they're like, everything's just running on VMware still and we're going to move to the cloud this year and we're going to Dockerize everything. But I just see so many companies still struggle with jumping that gap.

‍Do you think there are specific use cases where VMs will continue to have that edge where people will continue to lean towards VMs? Or do you think that as we're moving more workloads to the cloud that the container format is the best way of shipping software? Or is there room for VMs? Or is this just like a thing of the past for us?

Solomon: 09:30

I think containers are...there's several dimensions to them. Depending on the dimension you look at, I think the answer is different. And if you mash it all together and just look at containers versus not containers, you lose some important dimensions to answer that question in a useful way.

I think the simplest split is to distinguish containers as an application platform. The way developers think of the pieces in their application. So a logical system for the application.

Then separately you have infrastructure, how the machines are going to allocate work. Whether it's compute, storage, networking, how you scale out, where it runs, the life cycle of a machine. When does the storage appear, disappear? That's all infrastructure.

Containers are really interesting because they're relevant to both. They kind of straddle both, right?

Cory: 10:30

Yeah.

Solomon: 10:30

And so if you go to KubeCon, or if you talk to most of the Kubernetes community, containers are primarily about infrastructure. The real revolutionary aspect is we used to do VMs and now we're doing containers. And I mean, that's 100 % true. It had a huge impact. But it doesn't actually capture all of what containers are.

The other aspect that's just as important is that developers, it gives them a tool for thinking about the architecture of their application. And that's really the dividing line. It's about separation of concerns between developers and infrastructure teams and ops.

I think on the infrastructure side, it's very fragmented. I feel like there's a million ways to run containers. You can run them on bare metal machines, on VMs… there's all sorts of really interesting patterns emerging, but it's mostly hidden from the application.

In particular, the trend that I find really interesting is the line between what is a VM and what is a container, at the infrastructure level, is blurring. Like now you have projects like Firecracker… I forget, I'm really bad at remembering all the project names. But there are ways to run a VM as if it were a container, basically. Or make running a container feel like you're running a VM. So you can get the container tooling and ecosystem, but then when you're running that workload in your data center or in your cloud it has the security, the isolation properties of a VM.

So the lines are blurring. In infrastructure, I guess container versus VM increasingly will become an irrelevant question. Like it'll be both kind of. So you have VM deployments, but why would you not benefit from the ecosystem of container tooling? Like there's no downside, right?

On the application level, it's different. It's a war of the platforms. It's a war between platforms on getting developers to target their platform. So there it's more about, am I targeting Linux as my platform? Because containers are really Linux containers. Most of the time, it's not Docker, it's Linux/Docker, you know?

Cory: 12:41

Yeah.

Solomon: 12:42

Linux/OCI. It's an extension of Linux. So if your platform is Linux, it's a web app, it's going to run in a Linux server, then containers are hard to beat. It's like the ultimate form of Linux as a platform. It's undistinguishable. But there are other platforms out there that hide containers from you. Like these CDNs, like Cloudflare, Fastly, others. They have these cool serverless workers platforms. That could be targeting WebAssembly or JavaScript isolates. There may still be containers underneath at the infrastructure layer, but as a developer, you don't care anymore.

You’ve got the successors of Heroku, like Vercel, Netlify, these more front-end centric platforms. Same thing, maybe they run containers, maybe they don't, but as a developer, you don't have to worry about that. And on that side, what I'm seeing is extreme fragmentation. It feels like every week there's a new platform, a specialized platform, and there's no winner takes all there. It's going to get more and more specialized.

That was kind of a long answer. On the infrastructure side, they're merging. There's just like one substrate that's the best of VMs and containers. And at the application layer, it's the opposite, like explosion of choice, fragmentation all over the place. But it's going to all run in containers under the hood anyway on this containers/VM combo.

C‍ory: 14:08

Yeah. The proliferation of the platforms out there, I find kind of interesting. I love it and hate it at the same time. It's one of those things.

‍Like Heroku to me, early on when I first got that first Heroku experience, I was like, “Oh my God, I literally just get to focus on my software.” I was so excited about it.

‍I feel like we've been in the cloud for such a long time as a community. It's like when you actually can say, “Hey, I actually can get my product off the ground and I didn't have to do any cloud stuff whatsoever.” That's freeing to many engineers without that experience.

‍But then at the same time, my Ops heart is like, somebody's going to have to migrate you when you guys hit a billion dollars and that's going to be rough. But yeah, I really like seeing some of the platforms that are out there.

‍So when you were first bringing Docker and containers to the masses, I'm curious, like what was the biggest hurdle that you saw companies struggling with then, moving from VMs or running on bare metal to containers? Like what was the biggest pushback?

Solomon: 15:12

Well, we went through several phases. Initially, it was just a completely new category. And so for the longest time, while I was there, it really was about explaining the container model. Explaining containers, what they are, why they're useful, how to compare them to what you have.

You kind of had to buy into a completely new ecosystem, right? Life was good. You had storage, compute, networking, logging, security tools, whatever, like a whole suite of products for infrastructure. And now all of a sudden you're supposed to revisit all of it.

Some of those products you were unhappy with so maybe problems will be solved, maybe things will be faster or more scalable or whatever. But other products, you weren't planning on revisiting. Re-platforming is disruptive.

Once this movement started, a lot of people got sort of forced to pay attention and learn. And you know, depending on where you are in your business priorities or in your career, you may not be happy about that.

So the transition from the early adopters that were just by definition excited about containers… they were the ones who understood, they experienced the pain of not having containers, and they understood the potential if they could just get everyone to switch to containers… that was the fun part for us.

We were making things happen. We were changing how applications get deployed. But all the stuff we take for granted now, by the way… like we call cloud native, et cetera… I think especially younger engineers, they've never known how it was before.

Cory: 17:05

Yeah.

Solomon: 17:06

Like the concept of building a container around the application instead of starting from a server that you had to upload files into and kind of change from the inside. Like that's just sort of how things work now if you're out of school this year. Like, “Well, of course I build a container around the app. What are you talking about? Like a build a server, like a little server around the app. That's how like, what's the… I don't understand. What are you talking about? How would you do it?”

But we have the opposite problem. Like, “What do mean build a server around the app? Like it's supposed to be the other way around.” Or even little things like the Docker file that came a little bit later that in itself was like an epiphany.

First we had Docker, you could run, run, run, and then you could commit the states of each container after each change. You can look at it from the outside, it's this movable unit and it's lightweight enough that you can do that, unlike a VM. And then, you know, basically this idea emerged, “Oh wait, we could actually use this to build. It's so cheap and fast, plus we could do some caching.”

And then we experimented with this very basic Docker file, like you can see still how basic it is. You could see it came out of a prototype and it just never really… we never got a chance to finish it because as soon as people saw it they were like, I want this right now.

In the community, the enthusiasts said, “I want this right now. This is perfect.” And here we are stuck with Docker files everywhere. But their colleagues who were forced to look at this and be like, “Okay, now I have to learn this new thing. Great.” They really struggled to understand why.

It's just a really different model. You look at a Docker file, it gives you a list of things to do from the beginning and then you're rebuilding again and again. In the pre-Docker world, if you have a list of 10 instructions to follow, then if you rerun these, the assumption is, it's going to redo everything.

It's going to re-download the damn image and then it's going to re-execute apt-get install this. It's going to rerun everything. It's going to be take forever. This is such a dumb way to do this. The state of the art is you have a lot of custom caching logic. Like, okay, first you check if you've downloaded the image and if you haven't, then you download it. Like that was the state of the art.

But we just did that automatically. Right? So all you have to do is write the 10 things you want it to do in order. And then Docker would figure out which ones needed to be done again or not. And that required such a change in how you thought about things. That was a painful transition.

So imagine that multiplied by everything containers touch: storage, monitoring, you know…

“What do you mean there might be a thousand more of these little servers you call containers in 10 seconds? But it takes me 10 minutes to manually hook up monitoring for each server. I have 20 servers. That's the number of servers I have. You're telling me I don't know how many servers I'm going to have?”

The biggest challenge was educating on the paradigm shift. After that it was just missing features, competitors, you know…

C‍ory: 20:32

I actually remember the first time I saw a Docker file. It was funny because we had this small team, like 12 people or so, and we had a QA person and I got a pull request where I saw our README just had like hundreds of lines removed from it. And it was from the QA guy. And I was like why did he remove all the setup instructions from the README? Where are they now? What files are they going to be in? Is he just moving it to another Markdown file? (It might have even been Textile at the time. I don't even know if Markdown existed yet. I think Markdown existed.) Then, all of sudden, I see this other file. I was like, “Docker file? Why did he move all the instructions there?” And then it just hit me and I was like, “Oh my gosh, I can just run my dev environment with a single command.”

‍I was like, this is amazing. This is the best QA person I've ever met. Merge, merge, merge, merge. [Solomon laughs]

C‍ory: 21:30

Very cool.

Solomon: 21:31

That's cool.

Host read ad: 21:35

Ops teams, you're probably used to doing all the heavy lifting when it comes to infrastructure as code wrangling root modules, CI/CD scripts and Terraform, just to keep things moving along. What if your developers could just diagram what they want and you still got all the control and visibility you need?

That's exactly what Massdriver does. Ops teams upload your trusted infrastructure as code modules to our registry.

Your developers, they don't have to touch Terraform, build root modules, or even copy a single line of CI/CD scripts. They just diagram their cloud infrastructure. Massdriver pulls the modules and deploys exactly what's on their canvas.

The result, it's still managed as code, but with complete audit trails, rollbacks, preview environments and cost controls, you'll see exactly who's using what, where and what resources they're producing what all without the chaos. Stop doing twice the work. Start making infrastructure as code simpler with Massdriver. Learn more at Massdriver.Cloud.

C‍ory: 22:32

So speaking of QA, you've moved on to another project that hits the testing world pretty hard. That's Dagger. ‍So, can you tell us a little bit about Dagger and the problem it's addressing and the inspiration behind it?

Solomon: 22:47

One way to explain Dagger is it's sort of the continuation of what we were trying to do with Docker. So, a never-ending quest to help developers save time, help you ship faster, which is what good tools should do.

That was the focus at Docker. So much time wasted on things other than building the app and deploying it, and then learning from your users and building again. You should be building the product, talking to users. That's what you should be doing. But you're spending all this time on other tasks that are a waste of your time. So insufficient automation.

The analogy I like to use is like a software factory. You start a project, it's like an artisanal workshop - You're just building stuff and figuring out, you're just winging it and you're moving fast. Then as your project grows, if you're lucky enough to have that problem, then there's more people on the team, more users, more scale, more complexity. And so it becomes really hard to keep that workshop going in an artisanal way. You need to start streamlining and automating. So whether you like it or not, the workshop becomes a factory. It becomes more automated, more standardized.

How you define that factory kind of defines how fast you're going to be able to go. The cost right now when you automate is that you have to lock everything down, you have to kind of define a monolithic standard. So you have to say this is the build tool we're going to use, this is the language stack, this is the CI. And you just sort of say this is how we're going to keep shipping fast at scale. But then you lose agility because you can't easily adapt, right?

Then over time what happens is (if you keep growing) you're going to acquire a company, you're going to build a new feature that requires a completely different set of tools, like an AI feature. Things will happen where your monolithic standard doesn't work anymore for everything. So you're going to have competing standards in the factory. So you have this more messy kind of organically growing factory and so that's how you get agility but now things slow down, things are complicated again.

This is a universal problem. This process of the software factory appearing and then sort of becoming more complex and things gradually slowing down. Today, the state of the art for solving this is that you just throw money at the problem by hiring smart engineers to keep it running anyway. And also throwing bigger and bigger servers at it to run the pipelines as they slow down. So that's kind of the state of the art.

Dagger's goal is to help fix that by making all those pipelines that build and test and deploy your app just way simpler and more modular so that you can still have your standard, but it's a modular standard.

So it's like a factory made of Lego. You have a standard. There's one way to do things. But at any time you can swap components out. You can customize and you can keep the factory kind of evolving alongside the product.

It really all boils down to velocity. You want speed and agility even as you grow. And that's just impossibly hard to get right now. Especially now that capital is more scarce. Most companies can't afford to hire 20 or 50 or 100 of the world's best SREs and throw them at the problem. You know, build us a better platform. You just can't, it's not cost efficient.

So that's kind of the high-level, lofty story here. We make your software factories better by making them more modular, like Lego. In practice, the less hand-wavy engineering answer is Dagger is an open source engine that runs your pipelines - build, test, deployment, whatever, a pipeline's a pipeline - and it runs them in containers. And it runs them in an API first way. So there's an API for all of it. The API has SDKs in several languages. You can drive that API from the command line. And it all runs locally as well as in CI.

That part's key because now you have an engine that lets you program your pipelines, run them in containers, and run them locally, which means that you can iterate on the pipeline itself as quickly as you can iterate on the app that it's going to ship.

Our typical user is like a DevOps engineer, an SRE, whoever's in charge of CI. And their biggest problem is that they have to go through what we call “Push and Pray.”

C‍ory: 27:45

Ooooh, I know…

Solomon: 27:47

You know what I'm talking about?

C‍ory: 27:49

I think I know what you're talking about.

‍OK. Hold on. We're going do something. Everybody that's listening, stop what you're doing - unless you're driving. Keep driving. If you've ever, I don't know, let's say fixed a bug in a single commit, but it required a CI change, and you had 15 commits afterwards to get that GitHub action passed… raise your hand.

‍Then look around and see if anybody else

Solomon: 28:16

Unless you’re driving

Cory: 28:17

Unless you're driving... there's probably nobody else around you raising their hand right now and that's okay, but everyone feels it. That's what you're talking about, right?

‍You push it and you're like, “Agh, I got that YAML config wrong.” Push it again, it's still wrong. It's like, it took me five minutes to fix this bug in real code and it took me 25 pushes and like three hours of twiddling my thumbs watching a runner go, “Agh, you got it wrong.”

Solomon: 28:42

That's exactly what it is. And everyone goes through it. It doesn't matter if it's a two-person startup or like enterprise, whatever. Everyone has this damn problem. And what's funny is that it's the same problem applications used to have before Docker: “Oh wait, it doesn't work on my machine, but it works on the server. But it's the staging server. Can we get another one?” “Oh no, that'll take a week, you know, it's like a special snowflake.”

So that's CI today. And CI is really the beating heart of that software factory today, right?

Cory: 29:19

Yeah.

Solomon: 29:20

I mean, it's CI plus a lot of shell scripts, plus some Docker compose, YAML, some Docker files, some Groovy, and then just a lot of glue, right?

So I started with marketing speak, but it's really helpful to have a nice simple metaphor to explain to everyone why this matters. Software factories - that's what it is.

Cory: 29:44

Yeah.

Solomon: 29:45

In practice, the biggest problem right now is these pipelines - you can't run them the same everywhere. And it's killing velocity. And the bigger the product, the worse it gets. So we're just starting from the Lego brick, this Dagger engine, and our goal is to just fix the whole problem from the ground up, starting from first principles.

So we're kind of taking the long winding road. It was the same with Docker. We don't mind doing that because another layer of hacks with a quick and easy business on top of it is just not going to cut it.

If you look at that Market category, the software factory - First of all, it doesn't exist. Maybe application delivery software supply chain would be the closest thing. But it's basically incredibly fragmented. So if you look at the ‘marquitecture’, like the map of actual vendors and their credit categories and how they position themselves compared to each other, it looks like a super neat, well segmented map.

Here's CI. Here's CD. Configuration management. Paths. IT automation (that would be Ansible and stuff like that). What am I missing? Infrastructure as Code. Build. Container build. Java build. Whatever Bazel is.

Cory: 31:05

[Laughs] Whatever Bazel is.

Solomon: 31:07

Linux distributions. Nix. Docker compose. The dev environment stuff - that's pretty hot right now. Dev environments. Data engineering. ML Ops.

It looks so neatly separated. And then you go and talk to an actual engineering team that's actually shipping product at scale, and you ask for a diagram of how they do all these things. And it's like a huge spaghetti ball that they're trying to make sense of. It's just boxes and arrows intertwined everywhere. You're lucky if you can even have a cohesive map of all the boxes and all the arrows in one place. Today, only the best engineering teams have that.

Most of the time, what you have is, “Okay, this is what my team does. And then I just take the thing and I put it in the thing here, and I don't know what happens after that.” That's the state of the art today.

C‍ory: 32:06

Yeah.

Solomon: 32:07

And it's all held together by shell scripts. There's no platform underneath that will show you all of it and say, this is what's going on, this is your DAG.

‍So the DAG, that's kind of the magical word for us, for our community.

C‍ory: 32:23

Oh my gosh. So, hold on, you’re talking about Directed Acyclic Graph for people that... Wait, is that where the name Dagger comes from?

Solomon: 32:32

Yes, that's correct. That's where it comes from.

C‍ory: 32:33

Oh my god, that's so good. I heard about you guys like a year or two ago, and I was like, “I wonder where Dagger comes from. That's an interesting name for it.” Okay, there we go. Oh my gosh. Everybody just got a wrinkle in their brain. That's awesome.

Solomon: 32:52

So we're very community centric. It's open source. There's a discord. It's very intensely community driven - like Docker, but 10X. We just took everything we liked doing at Docker and did everything 10X.

Imagine a discord full of DAG Nerds (people will see DAGs everywhere) and we're just trying to unify the DAG. We have this joke saying ‘One DAG’, you know, like “One love”.

You start from the build. You pull a CI pipeline (and that's a DAG, it should be a DAG), then you pull on that and eventually you get into the build, your data pipeline, your deployment plan - in the end it's all one DAG.

So our goal is to gradually give you a platform that's like Lego. You know, Lego-like modularity where you can actually gradually run your whole DAG, or at least model it.

That informs a lot of our constraints. Like, okay, if that's the goal, then it has to integrate with everything. We can't start and say, throw away what you have and let's start over. No, the idea is the tool adapts to what you have and then you use it a little bit here, a little bit there.

And guess what? That's how Docker also got adopted. Docker never said, “Throw away your stack, you need to be worthy of the tool.” We said, okay, where can we help? We'll take it one step at a time. So Dagger's the same.

C‍ory: 34:22

Yeah, I love a crawl-walk-run. That is one of the things I feel is so hard… when you're on one of these teams, it’s like, everything kind of sucks, we want to make it better, but like we don't have time. Like it sucks so much it’s hard to get your nose above water. And then you see something and you're like, “Aah, if I had the time, I know that would save me a bunch of time.” But that initial investment to get the thing that saves you time so you can reinvest and not having that crawl-walk-run in a product is so hard.

‍So, let's say I'm in a team today and whether I've got a bunch of stuff in Jenkins or I've just got the Iliad of YAML in my GitHub actions. What's my first baby step with Dagger?

Solomon: 35:07

Usually there's always the one pipeline that's a ‘hair on fire’ problem and it depends on the team. Some common signs are it's too slow to run. Like there's a serious performance problem and we can't throw any more compute at it. We have to sort of dig in and understand why it's slow and kind of refactor it. And the refactoring is too painful because it's a pile of scripts and it's just a nightmare.

Another one is we have to change it - a change of tool chain, a change of CI provider, some sort of change that we can't avoid anymore. And again, we're back to, okay, someone's going to have to re-engineer this pipeline and again it's a bunch of shell scripts and like it's three layers - it's a turducken of Shell, YAML, more Shell, more YAML. Like it's crazy what's out there. And let’s say the engineer who built it is gone. So no-one actually wants to touch the thing.

Cory: 36:10

That engineer's definitely gone.

Solomon: 36:12

Yeah. The engineer’s gone and maybe they at that were really excited about Haskell and they wrote in Haskell. That stuff happens all the time. They use the Haskell to templatize a Docker file. So it's a Haskell tool to generate a Docker file from a template. The template language is custom, they invented it. And then that just keeps going. Who wants to refactor that? So that's usually a sign, stuff like that.

Integration tests are really interesting because they get really complicated really quickly. You need a lot of glue to set up the dependencies. Anything that could be solved by containers, but you need to kind of program how the containers are set up and orchestrated, and you need to do it in a portable way.

So when you start hitting the limits of a Docker compose file… like, “Oh, Docker Compose, but if only I could script it and program it.”

There are some known blockers where someone says, “Okay, this pipeline is just too much, I need to run it locally.” It's slowing the dev team down that they can't run this particular integration test suite or this particular build… that just is a little custom that they can't run that in the dev loop… they have to commit, push, wait. You know, it's slowing them down.

Sometimes it's just not possible because the CI is missing something. Like they don't have the new tool yet. They're waiting on the DevOps team to add the new version of the whatever.

So it's all the typical pre-container problems, but for the pipeline. So the point is usually it's one pipeline, it's not all of them. And so you just do it for that pipeline. You Daggerize it, as we call it.

C‍ory: 37:59

Gotta Daggerize it.

Solomon: 38:00

And then you get that pull request with a lot of removed lines like you got with the README, except that instead of removing lines from the README, it's removing lines from the CI configuration basically.

C‍ory: 38:10

So I feel like early stage companies, probably the people that are using Dagger are the engineers because they're the only people there. But in like later stage companies, when you start to have the many shades of DevOps, for lack of a better term - like I got SREs, platform engineers, Ops, DevOps, NetOps, MLOps, the QA guy - Who is the person that you see today that is like, “That. We need to get that in here now.”?

Solomon: 38:35

First of all, in terms of org chart and titles, it's exactly what you said. It's all over the place. In smaller teams, there are two scenarios.

One is the central platform team, platform-ish team, is modernizing. They're upgrading the platform, they want to do it right, they find Dagger and they build a platform that's based on Dagger. Soon, maybe, I'll be able to name specific names, but there are companies that are well-known that are going through this process right now. Where they decided we're going to use this Lego, we're going to standardize on this Lego, we're going to build a new version of our factory. It's going to be ours, custom to us, but built on this Lego that is Dagger. So that typically comes from a central team. Whoever owns CI usually.

C‍ory: 39:22

Okay.

Solomon: 39:23

But the key is they overlap there at the border of App and Infra.

Usually once App teams start using it, they can't get enough of it because they get involved in the development of their pipelines. You know, that's the big difference. And the Infra team on the platform, the central infra team - it's huge because you can push a lot of the work to the Dev teams and you can stop being a bottleneck. Because if you have a hundred Dev teams and one platform team, that's a lot of different dev tool chains that you have to go and translate into stupid Jenkins libraries - no offense to Jenkins. Everyone hates YAML, and then you have teams who hate Groovy and they wish they had YAML, you know.

C‍ory: 40:11

Yeah.

Solomon: 40:12

In larger teams, that's what happens. It comes from the central platform team.

Then for startups and smaller organizations, it comes from the Devs directly. And usually there's one - we call them the designated DevOps person - there's one person who just gets stuck with the builds, you know, the CI.

Cory: 40:29

Yes, yes!

Solomon: 40:30

Yeah. And sometimes it could be something silly. Like they just said, “Okay, I'll do it.” And then everyone was like, “That's not my chore anymore. You know, that's Bob, he's doing the CI.” And Bob is like, “Well I didn't really want to do it, but I guess I'll keep doing it.” So it's like a chore, right?

C‍ory: 40:47

It's funny you say that, the designated DevOps person, that's literally how my career started.

Solomon: 40:53

Yeah, that was you?

C‍ory: 40:54

I was in Ops, I was like I am going to be a software developer. I changed careers and then like six months into my first job, my boss… it was when AWS was first rolling out… he's like, you're going to work with the Ops team to figure out how to take all these developers and all these Ops people and do some cloud stuff. And I was like, “That's not what I signed up for.”

‍I feel like there was this moment where (and this moment I guess still exists)...

‍Some people are probably going to hate what I'm about to say, but I'm going to go for it anyway.

‍I love pre-commit, the tool pre-commit. I don't know if it was around at the time that GitHub Actions came out, but it really felt like people moved away from it. I remember the big appeal of it early on was, especially going back to pre-GitHub Actions, pre-having decent CI options - Sorry, CI companies, you're all decent.

‍It was like I can't do this one thing in CircleCI or whatever. We've got a pre-commit that does our formatting or linting or whatever. But then it just became like, why isn't all of that just happening in CI? And so that's like, yeah, just kind of have consecutive actions now. And now it's like, you'll push something a lot of times, then the build will fail and you're like, “Agh, it's the linter. I didn't go format. If I had pre-commit installed, I would.” But I feel like a lot of people have moved away from using pre-commit because they're like, “Well, I'm split-brained. Half my CI is here, half my CI is there.”

‍I feel like with a CI that can actually run locally and save us that ‘push and pray’ problem, we can also start to get to a point where… I mean, despite how much faster Dagger makes the builds, the speed impact of not doing that extra work because you push something that was silly… pre-commit would have caught it or Dagger would have caught it. Like it was a format, the code works, the tests pass. Everything's great. Hell, it even built the image. But it wasn't formatted properly so it bailed, right?

‍I feel like a lot of that has gone away. What's funny about it is, I feel like every time I see benchmarks on GitHub Actions it's like, all of my CI stuff actually runs much faster if you execute those commands locally because I have so many more cores.

Solomon: 43:06

Yeah, that's the thing. I think something happened to the word CI along the way, because CI is a thing you do, it's continuous integration. You're continuously integrating. It's literally a continuous task. But somewhere along the way it became a place, right? Like CI is that server over there, that cluster over there where you do things.

So the concept of before CI or in CI, right? Outside of CI or in CI was a mistake. It's supposed to be continuous. What do you mean before? Before a thing that's continuous, what does that even mean?

One thing we're struggling with is it's the same journey of education as we had with Docker. We're trying to bring the word CI back to its roots. It just means you're continuously integrating. It didn't say anything about on what machine. Just continuously integrate the piece that makes sense on the machine that makes sense. And, like you said, these days, developer laptops are pretty damn powerful.

We talk to users that actually have integration tests that cannot fully run in CI. A lot of times it's the other way around, because a nice scalable cloud cluster is also nice. Or too expensive… but you paid for this hardware. If you've got a test suite that could run on that hardware and you can run it 10 times faster and more frequently than you would in the server over there, why wouldn't you do it?

Our goal is not to change where the CI place is, to say, “Well, CI used to be on the server, now it's on your laptop.” We're saying it shouldn't be a place at all. All your compute should be available to run any piece of your CI that it makes sense to, and you should be able to change that over time also. That's the idea.

Cory: 44:57

I love that - it's continuous integration. It's like I know what CI stands for, but like you're so right. We did just kind of be like…

Solomon: 45:04

You kind of forgot.

Cory: 45:05

CI, that's a product I buy and it does that continuous thing for me. It's like, no, it's the entire process…

Solomon: 45:14

Honestly, for us, there's a marketing and a positioning puzzle that we haven't solved yet. Like we don't know if the word CI can be saved or not. We could redeem CI and say, look, this is how CI was supposed to be. That's what we stand for at Dagger, you know, we're going to claim CI as something that could be better. Or, if we just can't get enough people to kind of change how they understand the word, then we'll just pick another word.

But it's not really up to us. I mean, we're out there explaining it (like we're discussing it now), but it's going to boil down to what sticks in people's brains and how easy it is to change the meaning of a word that's been in people's brains for 15 years now.

So we'll see, I mean, either way, we're going to keep doing the same thing. But you can't really control, I think, what things will end up being named. It's hard to predict what sticks. So our approach is to try and not let ourselves be defined by the buzzword, because our mission doesn't change because a new term becomes fashionable for it.

That's why the whole DevOps versus Platform, et cetera, - I always stayed away from that. It seems like it's the same factory to me, you know, just different approaches maybe.

C‍ory: 46:34

That's why I was laughing as you were saying… I was like, hey, I know personally that that journey is hard of like trying to reclaim a world.

‍We all saw this happen over the past 15 years - the word DevOps has been just bastardized as far as like what it means.

Solomon: 46:47

Right.

C‍ory: 46:49

Somebody somewhere is like, “Well, I know what it means.” I'm like, “Sure. But if nobody else agrees with you, then you don't.” Right? Like that's where we've ended up.

‍And I feel like, honestly, platform engineering has kind of gone the same way. Like I meet plenty of teams where their definition of what it is varies wildly.

Solomon: 47:05

To me, that's just how words work. Like you have to accept it. And then, who knows, we might be pleasantly surprised.

Like the DevOps example is perfect because there was a specific meaning by the people who created the word. I honestly have followed so many versions of the debate, I kind of forget, but I know that's true, that's a fact. Some people came up with it and they had a definition and they have an opinion. The thing is, at some point, you don't own words, right? The people using the word collectively own the word and sometimes they just change what the word means. There's a whole field of study for that, for the history of the meaning of words. This word used to mean this in the 15th century… that's the same thing. So you just have to be aware of it and accept it, I say.

C‍ory: 47:52

You know everything's going great when there's inevitably an article or a blog post where it's like, “Dagger is changing the definition of CI. Continuous integration can start now.” You're like, yes.

Solomon: 48:07

Yeah, I'll know we're succeeding when people get really mad at us. That's when I'll know, okay, we're onto something.

“How dare they?” - Okay, nice.

Cory: 48:15

[Laughs] I must be on the way to a series A right now then.

‍You've been now involved in two open source projects that have spanned well over a decade. There's so much going on in the open source world now around licensing. We saw the WordPress fiasco. There was a Terraform Open Tofu thing (which we're involved in Open Tofu).

‍I would love to know, just based on everything that happened with open source and Docker and the Mobi project, has anything changed about the way that you approach open source and licensing of your projects, given what you've learned and what you've seen happen in the community over the past few years?

Solomon: 48:59

Yeah, I think so. It was a learning experience for me for sure. You mentioned earlier that it was like every founder's dream what happened in Docker, and I said well in some ways and in other ways not so much because there was so much interest so quickly. We became so disruptive so quickly… including to some very large tech companies that became threatened by what we were doing… it was a mad scramble to compete and get a slice of that pie.

Disruption and impact and tech means, okay, someone's going to make money, someone's going to lose money - if there's a business involved. So if you're making money selling an operating system or a data center OS or a storage solution, like in some way you're impacted by this shift to containers. And so you're going to make damn sure you're impacted in the right way.

The reason this is relevant to your question in open source is we were very naive… arrogant in some ways, like, “Yeah, of course our design's better. Like, what do you think? Like containers are awesome, you'll see.”

You know, that arrogance, in some ways, served us because we dared to do something completely different and ignore advice to just stick to what existed. But we were also naive in that we assumed since so many people in the community clearly loved the product and were appreciative of the fact that it was all open source (we were just opening it all)... We assumed the love would continue and everyone would continue to love us no matter what.

Also it became really important for us that everyone loves us. So as engineers, we started caring too much about the opinion of every other engineer. The opinion they have of us, our work, the aesthetics of it, the ethics of it, the morality of it, whatever.

The problem is, if you grow enough and you have enough impact, it's impossible for everyone to love you. People have a range of opinions and experiences. You're just going to have haters, first of all. And then you're going to have a lot of disagreements on how open should it be? You know, should it be controlled by one company?

On top of that, because of the business competition, the competitive dynamics, that open discourse is going to be poisoned a little bit by bad-faith participants. Specifically, competitors speaking on behalf of the open source community to try and steer the perceived opinion of the community towards their business interests.

Cory: 51:31

Yeah.

Solomon: 51:32

And the problem is, if you're sitting in the middle and you're getting all this feedback… first of all, this is all new. So we took years to figure out that was happening. And also, it's all mixed together.

You're getting negative feedback all the time, which is normal. This is broken. This is wrong. This doesn't scale. This is not secure. You know, whatever. And it's our job to listen to that and fix it. But then that negative feedback is mixed with negative feedback from people who are not actually users. They're impacted because they need to put Docker on their product because their customers are asking for it. They themselves are not part of this community, they're not here to be excited with us about containers. They're here to make sure they make their quarter. But, of course, they're not going to come up and say that.

Anyway, the big lesson for me is, if you make something open source that's made for businesses, eventually, if you're successful, the open source community dynamics will mix with the business dynamics. You just have to be aware of that and have a system for dealing with it and be prepared.

C‍ory: 52:42

Yeah.

‍Well, I really appreciate that. I know that we're over on time. And again, I'm so thankful you came on the show today. I know you're a busy man.

Solomon: 52:50

It's my pleasure. I talk a lot. I give long answers.

Cory: 52:56

If you want to come back and just do a four-hour show, I'll do a four-hour show.

Solomon: 53:00

I will come back.

C‍ory: 53:02

I don't ever shut up.

‍Where can people learn about Dagger? Where's the best place to go to get started? And how do they find you online?

Solomon: 53:09

Dagger.io, that's our website. Strongly encourage anyone who's into this kind of problem to join our Discord. It's really a killer feature of the whole platform. It's just full of really nice people that are obsessed with DAGs and running pipelines and containers and avoiding ‘push and pray’, and all the good stuff we talked about. It's worth joining even before you have a use case for the tool.

Yeah, come tell us what you're thinking about, why you're thinking about Dagger, what's the problem you're facing. And probably before we even have the time to reply, other users will reply first because it's just the community we have. It's really fun.

Cory: 53:47

Nice. That is awesome.

Solomon: 53:48

I don't know if I mentioned this, but that Discord is our office. Like there's no separate Slack. We're in the public Discord. We have a few private channels, but it's 25 of us and we're literally there all the time. So it's like visiting us at our office.

C‍ory: 54:04

Very cool, very cool.

‍You did a live stream with one of your buddies, I think, a couple of back… with the KubeSimplify guys, right? Like a two-hour live stream, I'll put that in the show notes. So if you want like a real deep dive and you just want to get in, like you want to pop open your console, start working and get guided by the man himself - we'll put that in the show notes so you can follow along.

Solomon: 54:25

Yeah. We also have a YouTube channel. We do a community call every two weeks, and users come in and give demos and show what they're doing with Dagger. And then we put all of those clips on YouTube so you can watch live demos.

C‍ory: 54:37

Oh, nice!

Solomon: 54:38

We do it live on YouTube. We actually had one this morning. So it's up now. But yeah, every two weeks and that's a lot of fun too.

Cory: 54:47

Awesome, we'll put links to that in the show notes as well.

‍Solomon, thanks so much again for coming on this show today. I really appreciate it.

Solomon: 54:53

Thank you, it was my pleasure. Until next time for the four-hour episode.

Cory: 54:57

Yes, exactly. Awesome. Well, thanks for tuning into the Platform Engineering Podcast. We'll see you next time. Thanks so much.

Episode 18

30th Oct 2024

Solomon Hykes on Docker, Dagger, and the Future of DevOps

Transcript

Listen for free

About the Podcast