Foundations of The Cloud With Adam Jacob, Chef

In this second episode of the Platform Engineering Podcast’s special series on the Foundations of The Cloud, Cory O’Daniel meets up with Adam Jacob, co-founder of Chef and System Initiative. They discuss his early interest in infrastructure and automation, the development and impact of Chef in the DevOps community, and his transition to becoming a CEO. The conversation emphasizes the community and technological advancements Chef brought to the industry and introduces the ambitious goals of his current project, System Initiative. Don't miss this insightful conversation—tune in now to discover the future of infrastructure and automation!

Guest: Adam Jacob, CEO / Chairman / Co-Founder of System Initiative

Adam is an engineering and product innovator, with decades of experience designing, building, and managing large production systems. Adam previously co-founded Chef Software, was the original author of Chef, served as CTO, and was on the board of directors.

Transcript

Intro:: 00:00

You're listening to the Platform Engineering Podcast, your expert guide to the fascinating world of platform engineering. Each episode brings you in-depth interviews with industry experts and professionals who break down the intricacies of platform architecture, cloud operations, and DevOps practices. From tool reviews to valuable lessons from real-world projects to insights about the best approaches and strategies, you can count on this show to provide you with expert knowledge that will truly elevate your own journey in the world of platform engineering.

Cory: 00:47

‍Thanks for tuning into the Platform Engineering Podcast. I'm your host, Cory O'Daniel, and with me is Adam Jacob, co-founder of Chef and System Initiative. Thanks for coming, Adam.

Adam: 00:55

Yeah, thanks.

Cory: 00:57

Before we get it started today, can you share a bit about your background and what sparked your interest in infrastructure automation and management?

Adam: 01:06

I started my career as a systems administrator when I was a kid. My parents bought a 286 with a modem, and I discovered bulletin boards. I thought that was the coolest thing in the world. I became obsessed with running bulletin boards and Fidonet nodes, which meant that when the Internet started to emerge in the United States, I was one of the few who knew how modems worked and how to run the systems. Naturally, I fell into running ISPs. I ran ISPs for a long time, and eventually, as everyone got online, we had to build applications. I then shifted to being a systems administrator managing production web applications and large corporate infrastructure.

After that, I became a consultant specializing in fully automated infrastructure. Our consultancy handled everything from application deployment, monitoring, trending, configuration management, and provisioning, all as part of a fully automated package. This work led us to create Chef, which we launched as a venture-backed startup. I was roughly the CTO of Chef for most of its duration, though I held various titles. Thinking of me as the CTO is not wrong. Now, at System Initiative, I am the CEO and still focus on building automation. At heart, I am a systems administrator, and everything else I do stems from that initial love of building systems and networks.

Cory: 03:05

Let me ask you a question about that. Is this your first time as a CEO?

Adam: 03:09

Yeah.

Cory: 03:10

What was that transition like for you going from a mostly technical role to being a CEO of a startup?

Adam: 03:20

Well, I mean, being the CEO of a startup with 17 people, where you're building a deeply technical product, is challenging. Today, we're getting closer and closer to having System Initiative ready for production workloads, and, you know, God willing, we’re just days away. There's a list of bugs, and we're burning through those bugs and coordinating with the team. I’m doing product work. Startups are always a journey where everybody does a little bit of everything. Depending on your talent and skills, you may be called upon to do one thing or another. What I learned from watching a bunch of great people be the CEO of Chef and others in the industry over the years is that being a CEO is primarily a judgment game. Being a CTO is similar in that there isn’t a strict script for what’s right or wrong. It’s not like there's a rulebook saying a CEO should never do this or a CTO should never do that… People love to say that kind of stuff. Can I swear?

Cory: 04:27

Oh yeah.

Adam: 04:28

To say shit like that is mostly not real because it's usually coaching from the outcome. When people tell you stuff like that, it's usually because, well, I tried it and this bad thing happened to me. And, you know, fair enough, there can be wisdom to be found there. But for the most part, it's more about figuring out how you think about building a company. How do you think about constructing teams? How do you think about building a culture that helps people understand what we're here to do, why it matters, why their work connects to why it matters, and how to inspire people to achieve greatness for themselves and, therefore, also for the company and the product? And that's so fun. So, you know, as a transition, it's a delight because I love doing that work, and it's so interesting and fun to do. And then, you know, a lot of the work that you think of as CEO work is about ensuring you have enough capital, making sure you can pay everybody. There's a lot of that stuff. I have a co-founder who is fantastic and takes care of many of those things for us.

Cory: 05:43

So are you actively doing development still, or are you just so caught up on, like, the product side?

Adam: 05:50

Yeah.

Cory: 05:51

Nice.

Adam: 05:52

I try to stay out of the critical path, but sometimes you can and sometimes you can't. I love building products and being an engineer. Part of what I learned during my journey at Chef was that I enjoy the whole game. Some people prefer certain aspects more than others. They might say, "Oh, I don't really like selling, but I like writing code," or "I don't like product." I like all of it. I like marketing, I like sales—I just like all of it. I think it's fun and I enjoy the structure of it. So, yeah, I predict that I will remain a person who writes code forever, regardless of how big or successful the company becomes. I might not be doing it every day, but if you have the skill, I think it's valuable and important to touch the truth of the product you build and understand what it does and doesn't do for people.

Cory: 06:50

I like that. So for people that are unfamiliar with Chef, can you give a short description of what chef is and where the idea came from?

Adam: 07:01

Chef is a configuration management system. It has a lovely history, starting with CFEngine 2, which is the primary branch of ancestry for Chef. Puppet was another entrant into that market. CFEngine 2 was the first open-source configuration management system that felt modern and flexible. Many of us built large automated infrastructures on top of CFEngine.

Puppet brought in a different perspective; CFEngine was constrained in terms of how you thought about when things happened. The big moment for CFEngine 2 was the idea of idempotent and convergent operations, where the order of operations didn’t really matter. What mattered was that the operations were idempotent, and the system would eventually self-heal. It thought about the world in terms of types of things you needed to do, like install packages or configure files.

Puppet started to think of the world in terms of resources that needed to be in a particular state, constructing a big graph of all the resources to be managed on a given operating system. For CFEngine, Puppet, Chef, and even Ansible, you are mostly talking about managing operating system instances. Similar techniques and technology apply to tools like Terraform, Pulumi, and the CDK. It’s all related, but what you manage changes. Puppet introduced resource management, and Chef brought the idea that you should write configuration as software. So, you know, when things were straightforward. The DSLs were fantastic. But Chef would let you do crazy shit.

Chef used a programming language instead of a DSL (domain-specific language) like CFEngine or Puppet did. Instead of learning a specific DSL syntax, you used Ruby and wrote an internal DSL in Ruby to express your configuration. This approach was particularly useful when things got complicated. While DSLs were fantastic for straightforward tasks, Chef allowed for more complex configurations.

A great example is a big media company with a popular TV show where people had to vote once a week. They had a huge amount of infrastructure running all the time, just waiting for the one day a week when the votes came in. This team automated that infrastructure, allowing them to turn it on and off the day before and after voting. They did it with Chef, and they were excited to show me. I went to the meeting, and they had written a bunch of Ruby code, using Chef primarily to execute their code. Chef was the runner. My reaction to that moment was fucking fantastic. I was like, "Amazing. You crushed it," because they did. They fucking killed it. They completely solved their problem. Then I was like, "Let me show you how to solve it with Chef. We can make it a little leaner, a little more elegant." But the point of view was that this is ultimately a pragmatic choice where you're creating a powerful tool designed for experts to solve really powerful problems. That's really what Chef was about. It came from my experience using other tools while trying to have big ambitions, like automating everything. The idea was, "What if everything was completely automated?" If the language is constraining, the answer is, "Well, mostly everything, but it only works 200 times, not 250 times in a row." And you're like, "No, I need that shit to work 250 times in a row."

Cory: 11:27

Nice. So it's funny. The first time I was in a very similar scenario, I remember we had an internal tool that was somewhat declarative. It was YAML-based. Then I got introduced to Chef by the team at Engine Yard. I'd moved on to another company, and we had this nightmare homegrown YAML-based infrastructure management tool. I love myself some YAML, but not that way. I first saw Chef at Engine Yard, and it pretty much set my career path around DevOps and managing cloud infrastructure.

Adam: 12:08

Yeah, I mean, Ezra Zygmuntowicz. If I hadn't written Chef, Ezra would have written Chef.

Cory: 12:13

Yeah.

Adam: 12:14

And, you know, when we found each other and I got to show him what Chef was like, it was really Ezra who first saw Chef and understood what it could be. He was like, "That's what I want. It's what I've always wanted, and we're going to use it at Engine Yard." Before Chef was publicly launched, Ezra and I were sitting in that Engine Yard office building Chef together and making it work for Engine Yard's needs. Unfortunately, Ezra is no longer with us, but he was an incredible trailblazer in this field. I think enough time has passed since his passing that many people forget, but Ezra was pivotal in the history of Chef and certainly in my life.

Cory: 13:05

So, yeah, he was also the first person to introduce me to Erlang. Chef's server was written in Erlang, right?

Adam: 13:12

Yeah, exactly. Rest in peace, Ezra. What a guy.

Cory: 13:16

He's an awesome guy.

Adam: 13:24

We were now having an ad hoc Ezra memorial. So I too am having a moment where I'm like, "Oh, man." I'm remembering my lost friend and thinking, "I kind of wish I could call Ezra right now and say, 'I was just talking about that.’” Do you remember how great that was? Hug your friends.

Cory: 13:39

Yeah, hug your friends, man. Cool. So was that your first big customer win or, like, use case was Engine Yard? Or were there other surprising use cases or industries besides the company that was using Chef as a runner?

Adam: 13:56

I mean, that was later on. I mean, Chef was everywhere, and both Chef and Puppet created what I think is the modern DevOps market. Those companies were pretty pivotal in doing so. CFEngine also played a role, though with less commercial success than Puppet. These tools and companies were trailblazers in creating a broader market that eventually led to the emergence of companies like HashiCorp, Pulumi, and others.

Early on, Chef's customers included those who had previously used Puppet and encountered similar challenges that I had faced. There was a particularly troublesome bug in Puppet that was difficult to track down, and I collaborated with a kid in New Zealand who could reproduce it. After helping fix that bug, I decided to move on. He became an early Chef user. There was a community of people who came together to help build those early versions of Chef and make it work.

Ezra was certainly the first person, and Engine Yard was the first commercial user of Chef. They began using Chef in production almost immediately.

Cory: 15:24

So it's funny, I was reading the state of CD report, and you know, I mean, with all these reports, we don't have access to all the data behind them. But it said something like, only 27% of about 19,000 companies were doing IAC in 2023. That number might not be exactly right, but, I feel like a lot of companies that I'll meet in my day-to-day are still doing infrastructure as code. And at the time when you were working on Chef, what types of challenges and cultural hurdles did you face? Like, this was happening right around the same time that the actual DevOps movement was kind of getting founded.

Adam: 16:11

Yeah, the DevOps movement sort of came a little after within years, to the mists of time. It's no big deal, but at the time, the configuration management and automated infrastructure movement got swallowed up in DevOps as part of a bigger transition. because it was clear that the capabilities these tools provided weren't enough to get to the full business outcomes that people really wanted to have. They were great, but there was a bigger transition that needed to happen in order for people to really get to a place where they were, you know, reliably shipping software.

At that time, the target was like ten times a day, which a lot of people still really struggle with. Like, we chuckle because we're like, ten times isn't that often. But, like a lot of people who listen to this podcast, they can't ship that shit ten times a day. And, if you go look at the John Allspaw and Paul Hammond's talk that they gave at Velocity, which was ten deployments a day at Flickr, and how they did it, that's DevOps.

ctly as they outlined back in: 2006

Cory: 18:23

I mean, valid, but ouch.

Adam: 18:27

I mean, it hurts. As someone who spent most of my professional life as a person, no one cared about, like, you know, I was a Morlock, I wasn't an Eloi. And I think the, um. So I'm allowed to say it because it was true. It was my people, you know, some of those cultural transitions were just getting people to have recognized that the operations piece mattered and it was as important as the engineering piece and that there was a valid career path that existed. Becoming great at operations was the same as becoming great at engineering, and that as you became good in both paths, you eventually reached the top of the pyramid and sort of indistinguishable from one another.

I think another was the idea that for a long time developers were allowed to write code and operations people didn't. They could write shell scripts, do some Python, or whatever, but they weren't real programmers. You still see this a little now. And so just getting people to the place where they recognize that, no, actually, if you are a skilled operations person who understands how to put together a complex infrastructure at scale, configure and manage it — all of that — you’re a programmer. The language you're writing happens to be the language of this infrastructure, how the switches are configured or how the cloud resources are instantiated or whatever. But your programming and what we need to do is start thinking about it and treating you with the same kind of respect that software developers had.

Now, due to some of the issues with how we've structured our systems and taught people to use them, we still automate infrastructure and the full stack exactly as I designed in my consulting company almost 20 years ago. I'm so proud of that because it's very rare to be a part of something that has that kind of longevity and works as well as it does, but it hasn't really changed. And so the outcomes that we're seeing, especially in big companies, tend to be mediocre. People tend to fall into the middle where they're like, “Well, we had these big aspirations, but mostly we got ground down because the way that we work didn't change enough.”

f going back to how it was in: 2001

t was like to do this work in: 2001

Cory: 21:36

Yeah. And it's funny, like, the ten times a day thing. I'm a big reader of reports. I love reading the industry reports. But, the last door report, like, the first three pages are all thumbs up, and then you get to page 37, and it's just sadness. The amount of companies that are still deploying once a month…

Adam: 21:57

Is the majority.

Cory: 21:58

Yeah, it's a real thing.

Adam: 22:00

It's a real thing. But we've just moved the goalposts. Somewhere on this bookshelf behind me, I have a copy of the DevOps Handbook, and I think the very first paragraph of the DevOps Handbook is, “We want every company in the world to be deploying hundreds or thousands of times a day with ease and grace.” And that shit is not what's happening. You know, if you just look at our own aspirations and what we wanted to have happen for each other as people, forget about the impact on industry, just what we wanted the experience of our peers to be. We just didn't quite deliver what we hoped for. I'm proud of what we did deliver. I'm so proud of it. It was really difficult, and it's significantly better than it was. It was great, but it did not live up to what we hoped it would live up to, and it's a bummer.

Cory: 23:00

I think one of the things as a software developer, the most rewarding is seeing your code in prod, not just sitting there writing it and seeing it in an OPR. That goal of being able to have that reward happen multiple times a day, that didn't get realized for many people, definitely sucks. I feel like in that scenario where you're getting deployed once a month, you end up with more anxiety about that time rolling around.

Adam: 23:23

Yeah, you probably wind up with more anxiety. I think even if you don't, you definitely wind up more disconnected from the outcomes of your work. If you look at the Dora report or you look at those things the number one indicator isn't how often you deploy, but rather the consistency of those outcomes. We know that the number one driver of those consistently high outcomes is how tightly the team was connected. It's how frequently we talk to each other. It's how together we are in that loop of what are we building? Who is it for? How do we work together to do it? In modern software delivery environments, especially in large enterprises, the complexity is too high for any one person to manage alone. You can't expect one person to understand how all the code flows through a large organization like Citibank. It's just not a thing. So you have to build teams that can effectively collaborate through these mechanisms. And we're still learning how to do that.

Now, this first generation that I think is still where we are for the most part in the industry. We took the way we did it as startups and as engineers, and then we just tried to blow it up into a huge, bigger scale for everybody. In retrospect, it turns out that it didn't work as well as we hoped, but, it wasn't a bad idea, you know? It's not because it was a bad choice. It was a pretty reasonable thing to decide, to try. But I think history tells us it didn't work quite the way we thought.

Host-read ad: 25:03

Ops teams, you're probably used to doing all the heavy lifting when it comes to infrastructure as code wrangling, root modules, CI CD scripts and Terraform. Just to keep things moving along. What if your developers could just diagram what they want and you still got all the control and visibility you need?

That's exactly what MassDriver does. Ops teams upload your trusted infrastructure as code modules to our registry.

Your developers, they don't have to touch Terraform, build root modules or even copy a single line of CI CD scripts. They just diagram their cloud infrastructure. MassDriver pulls the modules and deploys exactly what's on their canvas.

The result, it's still managed as code, but with complete audit trails, rollbacks, preview environments and cost controls. You'll see exactly who's using what, where and what resources they're producing, all without the chaos. Stop doing twice the work.

Start making infrastructure as code simpler with MassDriver. Learn more at MassDriver Cloud.

Cory: 26:01

At the time you were working on Chef, honestly, I think probably most of our lives the past 20 years, like, the way that we operate software has changed a lot. Right? Five years ago, we're all in data centers, and then, 20-ish years ago, we had VMs, right? And then all of a sudden containers came around. Then big containerization with things like Kubernetes and serverless, et cetera. It's like a big cloud changed over the years and we started getting containerization. How has that impacted your philosophy around operations and the work you were doing in Chef at the time? And how has that led to where you are today?

Adam: 26:39

Each of those things brings about both a technological capability that you didn't have before. People forget, but in the era before EC2, that was the era of Facebook apps. Facebook apps were the first time on the Internet that you could have a thing that you launched yesterday, and today you have 10 million users. I have friends who I had worked with for years who were working for companies that launched Facebook apps, and suddenly that happened to them. They were literally calling everyone they knew, just begging for gear. They were like, “Do you have rack-mount servers in your closet?” Because they literally couldn't rack systems fast enough for the immediate demand.

We made choices about how to solve those problems in terms of Amazon building EC2 and others building similar stuff. Eventually, the good user experience pieces will be out. A good example is Docker. For as much good as a lot of other containerization stuff has done, the true magic of Docker is the Dockerfile. It's just the ability that once I express what I want in this thing, which I can do pretty much by writing a shell script, what I get out is this repeatable artifact that then boots really quickly. Everybody who ran Docker, ran whatever the service was they wanted, hit enter, and it just downloaded an image from the Internet and ran that shit. You would crawl over broken glass for that. It changed what it meant to think about experiencing this kind of automation. Forget about what it did technically. The user experience was such a leap forward that you could never go backwards. Once you experienced it, you were like, “Oh, if it's not like that, then it's garbage.”

Eventually, those sorts of capabilities find their niches. They settle into the places where they make sense. Here's the place where they don't—things like containers. It turns out they make sense in a lot of places. Things like serverless may make sense in fewer places than we thought, at least in its current form. How it changes your approach to automation and tooling is mostly by thinking about how you want the interaction model to change, less a question of how do I want to change the idea of automation. Given these new capabilities, what new interactions can I enable for people at different parts of the process?

If you're an operations person who used to have to build automated provisioning systems by making the right layer two networks, DHCP servers, BOOTP, and this whole list, then you could get to a place where you could rack servers, and they would automatically install themselves. Now you have to figure out how to build an AMI. But it's roughly the same thing as figuring out what the BOOTP image was that you were going to boot when you built a data center. It's exactly the same. What's changed is the interaction, what changed is the API. Focusing on that layer is where, to me, all the interesting stuff lives because you can't control which technology is going to come into the world, what people like, or what they don't like.

I've never really loved Kubernetes, not because I think it's bad technology or that people who like it are bad. Most of the things I have built seem to not need it. This isn't a condemnation in any way of people who do need it or who love it; it's great. But it doesn't matter if I like it or not. It's a thing that I need to automate because people do like it and want to use it. If your argument as someone who automates stuff for a living begins with, “Well, you should only use technology I like in a way that I like it,” it tends to be a losing argument. You have to think about how to build automation that can automate things you don't understand, don't like, don't expect, or don't agree with. Frequently, the winners in this space are people who embrace that kind of pragmatism. There's always a lane for the people who don't, who are like, “Hey, this is a very opinionated, one-way street kind of move.” Over time, I think those tools tend to do less well.

Cory: 31:54

Looking back at Chef and where you are today, what do you think that Chef's most significant lasting contributions are to the infrastructure as code and DevOps field?

Adam: 32:08

The thing that Chef did that I carry with me more than any other choice we made is its impact on people. We had plenty of technical contributions to the art of how you build that kind of automation, but I don't hope that people look at Pulumi and go, "The Chef guys are the ones who taught us how to do that." That's fine. I don't care. What I do care about and what I think is the lasting legacy of Chef, and what I'm most proud of, is that Chef, as a product and as a community, transformed the lives of a significant number of people.

Because of where we were when we started, who we were as people, and how we brought that software to them, they learned that they were more than they thought they could be in terms of what they could do in their job, the impact they could have, and how much fun it could be.

It turns out when you fix the interaction model of a job that sucks, the job doesn't suck so much. Then people can thrive in ways they couldn't before because the day-to-day grind of the work was holding them back. When something like Chef comes along, it's not Chef that did this for them. They did it for themselves because they found Chef and thought, "This is my ticket." They used Chef to materially alter the course of their lives. There's no better legacy than knowing that's true. I know it's true because people still come up to me today and say, "Are you Adam Jacob? Did you make Chef?" Then they tell me how Chef changed their life. What better possible validation could there be?

The rest of it—how successful were you at running venture capital companies? How well did you adapt to strategic disruptions? Those are interesting and fun stories, but when you think about what matters, what I hope is that the people touched by that software, who made it a part of their lives and improved their own lives through their work with it, look back on those moments with pride and affection. I hope it was as good for them as it was for me.

Cory: 35:04

Yeah, awesome. I'm one of those people. Honestly, I was a Ruby developer for a very long time and I kind of fell backwards into what we now call a DevOps role. And dealing with these internal tools we built was miserable. And the first time I used Chef, I was just like, “wait, this is where my worlds are converging.” My development side has all this rails experience and the ability to manage infrastructure, which I was interested in being able to see those two worlds come together was very exciting for me. We're in a new world now, though. I feel like when we look at where the cloud has gotten us, our applications are just a bit different. For companies that are truly leaning into the cloud. We're not just running on the cloud, our software is composed of the cloud. SQS queues, SNS queues, Glue jobs—our infrastructures have gotten pretty wild, maybe even unwieldy. What do you think are some of the biggest challenges we're going to face from this point on, especially the advent of AI and where AI is going? What challenges are we going to have as far as operating our software and what challenges do you think we're going to have just culturally and as a part of our teams?

Adam: 36:17

I think the biggest challenge is the user experience. Whether you simply sit with the people who do the work, or if you're a person who's listening to this podcast and you do some of this work, either as an application developer, as an operations person, as a DevOps person, as a platform engineer, whatever your role is, and you just ask yourself, “How much do I like the experience of how all these things come together and what's it feel like to do my day to day?” And, “Where are the moments in my career where I felt true joy in the way that the work I needed to do and the tools that I was using to do it aligned to give me that moment of just pure, perfect zen.” Rails is a great example. Every single person who touched Rails in that initial glorious era of Rails forever transformed because of what it meant to write applications like that. Washington, it was so much better. I mean, it wasn't a little better. It was crazy better.

Once you felt it, everything from there forward had to be at least that good. And if it wasn't, you were a Luddite. And it's because it was worse, it's because no one was going to take that joy from you. The same thing is true for Docker, right? No one is going to give up the joy of running Docker build and deploying. It's so good. I think that user experience challenge for us is that we haven't really rethought the way that the entirety of the system connects together. We still essentially automate our workflows, the cloud, our data centers, and all of those components, exactly the way that we automated stuff when I started a consulting company 20 years ago. What's different are the tools. You don't use the provisioning scripts anymore, now you use Terraform. But if I was building a checklist of the things you had to cover off on, it's the same slot. You do it at the same time, the way it relates to the other tools.

And right now, for a lot of the industry, the best answer that they have is, “Well, we'll abstract those things from you. I'll give you a portal, I'll build a different kind of API, I'll do whatever.” But they're not really changing the fundamental shape of how we're doing the work. And my belief is that the trouble is we've done enough innovation in the last era to know that simply adding better tooling at the same slots in the stack just isn't going to change the outcomes.

And more abstraction on top of that same tooling, I don't think we'll change it either, because while it can change the user experience for some parts of the work, for others, it's going to make it materially worse. Your ability to understand what's happening in the low level will go way, way down, because your only interactions are these really high level platforms, where once those platforms don't work, your ability to pierce that veil and solve the deeper problems will go down, which means the platform will be adopted only in places where it fits and not in places where it doesn't. Which then hurts the learning value of that abstraction sort of over time, which then kind of leads you to the Dora report where you're like, “And in the end, on average, we deploy once a month”. And so I think our challenge as an industry is we have to fall in love again with the fact that we have control over these systems. These systems are not bigger than we are. They're not more complex than we can understand. We can know the details. And so that enables us to envision a different shape where the outcomes are different. But we don't have to lose what we learned over the last 20 years of doing this work. We can evolve in a way that really leaps us forward. I think that's the challenge for the industry, it's great and convenient to sort of stand on the shoulders of giants. But those giants didn't come from cautious leaps. CFEngine2 wasn't a minor leap from shell scripts. It was an incredible leap from shell scripts. It was a massive transition. And, you know, Puppet and Chef, I would argue, were smaller transitions than CFEngine2 was. Puppet was a pretty big transition, but Chef to Puppet, it wasn't that huge of a leap.

There are some fundamental differences. Using a programming language is different, but we're in an era now where what we need are new giant leaps forward, and those will be riskier, they'll be messier, they'll fail harder. But we need more of it, because I think we now know that without them, we kind of know what the results will be. And so, I don't think that's more of a problem for us as engineers to each other to just be like, “No, Cory, dream bigger, get weirder.” We're just not weird enough anymore because it's become the standard. And people take for granted that the way we built those configuration management systems is now the way that your infrastructure code tools are also built. Their architecture is the same. There's no difference, really, between the internal engine of how Terraform reconciles state. The only difference is that they don't have an Operating System they can interrogate. So they have to store it in data somewhere, and they slap it into JSON. But is that the best we can do? Was that the right design? It's a fundamentally different part of the stack. Maybe it wasn't the best design to do that. Maybe there was a different way we could have gone. And I don't think we're doing enough exploration of those alternative paths. Instead, we're sort of resting in the middle.

Cory: 42:37

Yeah. And I feel like you have a great talk. I can't remember exactly how you said, what if IAC never existed? And then it's like I see two big leaps happening now, the work you're doing with System Initiative, and then there's this kind of other direction with the Winglang.

So can we talk a bit about System Initiative? What are some of the motivations behind System Initiative?

Adam: 43:03

Yeah, we can talk about Wing too. I think you're totally right. That Wing is another perfect example of what I'm talking about. And for System Initiative, it starts with the conversation we just had. So that was the motivational piece for me, but also from my co-founders. And that belief that it was fundamentally a user experience problem led us to have… We've done four-plus years of R&D to get to a place where in a minute you're going to be able to use it in production. And because that's how hard the user experience problem was. If you wanted to deliver a fundamentally better user experience and sacrifice none of the power of your existing automation, it took four years of really deep R&D to get to a place where I believe that that's possible to do. It was not hard to build a toy that showed it'd be Neato. It was really hard to build a power tool that could hold up under really complex conditions, never before seen in a production environment like a bank.

How system initiative thought that through was by shifting the perspective of saying, “Well, one of the choices we made early on here was that we were going to automate these systems through writing code and that we would treat them roughly like application artifacts.” So the things that made building applications stable, continuous delivery, continuous integration, source control, that those things, we could just extend them naturally into the way we relate to infrastructure, and that through that extension we would gain all the benefits and a degree of user experience that we enjoyed because we all wanted it to be programming. It turns out that the environments that we're managing, to your point from earlier, are significantly more dynamic than that. They change from all kinds of different points of view. They change underneath you. The cloud providers go up and down, services crash. There's a million things that happen that make those things less good.

We wound up building systems that sort of start from their fundamental position of saying the source of truth is software, which is static, and it's supposed to be a reflection of the truth. But the truth is this incredibly dynamic environment that we don't even really understand and can't interrogate. And so, the System Initiative took that stuff, turned it into data. Then we took that data and we put it on this reactive hypergraph because what you need now is the ability to program it. You have to say, “Ok, I have some representation of what the truth is in the real world now I want to be able to really quickly make a change and see if it would work.” Well, if I have to apply that change to the real world, it'll take 20-minutes or longer, and maybe it's production infrastructure, and I don't want to take it down just to experiment.

Now you have to be able to fork that thing, fork the data, and then we stick it on a graph where each piece of data, like a string, is the result of a function that is itself reactive to its arguments. And the side effect is that you wind up programming this big graph of reactive functions. And once we have that, I can now build a user interface that's more like unity, or more like the things people use to make really complex movies than it is writing just an editor, because I can visualize the data in interesting ways. So it turns out it's much faster to compose infrastructure, if the infrastructure can tell, if different components of infrastructure can inform the configuration of each other. If I have a Docker image that exposes a port number, I should be able to say that the load balancer that load balances that container needs that port number as an input. And I can do it by showing it to you on a canvas and drawing a line. And then if I change the port number in the container, it'll update the load balancer and then tell me in real time if that change would or wouldn't work. It's kind of magic, but it took a very long time to build. Super complicated, and it's a big swing, but it's almost ready, and I'm stoked about later this year you'll be able to use it to run real production infrastructure. And it's sick.

Yeah, and I think Wing... Sorry, I'm going to jump into Wing because I really like it. Wing, and I love Elad, who is the CEO of Wing. Wing's perspective is different. It asserts that cloud systems and tools like SQS queues need the programming language itself, the way we express application and business logic, to seamlessly integrate with infrastructure as part of the code we write. By doing so, we could build simulators in the same way we build mocks for tests, providing incredibly rapid feedback loops for locally building and testing complex cloud infrastructures.

Using programming-level abstractions, developers can understand that messages coming in and out at runtime are mapped to different pieces of infrastructure. What's intriguing is that Wing bets heavily on user experience. It believes detailed infrastructure levels don't matter; getting programming primitives right allows programmers to focus on their core tasks. Operations teams focus on integrating new infrastructure only when necessary.

I'm interested to see if Wing's approach succeeds. Though I'm uncertain, I say this warmly. We recently had dinner together, and I'm a fan. What a bold bet! Even if it doesn't prevail, it pushes the boundaries of what's possible. Undoubtedly, if you're building systems using modern serverless technology and have to choose between Wing or traditional serverless, Wing offers a significantly better experience all day, every day.

Do I think Wing will become the primary mechanism for automation industry-wide? No. But System Initiative might, and that's why I'm building it.

Cory: 49:44

There you go. One of the interesting aspects I find, from what I've gathered in the videos, and I've noticed this quite frequently with operations engineers—as you mentioned earlier (and I'll note your use of quotes around "real programmers" because, well, they do matter—their work is truly important. Behind even the simplest "Hello, World!" on the Internet, there's a ton of operations happening totally behind the scenes. Yet, I still encounter many operations professionals today who may not be proficient in programming, at least not in more formal languages. They know Bash; perhaps they're learning some HCL. These individuals are generally outnumbered by their engineering counterparts, right? They're always overwhelmed with debt, like the idea that they have time to sit down, learn to program something, learn about all the cloud resources, while managing their debts, serving their engineering customers, is quite demanding for a small team. To be able to start diagramming stuff where it's a layer that we can all look at and understand, whether or not you understand the language, seems pretty innovative.

Adam: 50:48

The tricky part is that it's only useful if what you're expressing is the actual complexity that then turns into the work that you need to do. If it's just like, wouldn't it be great if we looked at a diagram together? The answer is no. That's why we don't spend a bunch of time diagramming stuff before we do it. Yeah, I mean, sometimes we do, but mostly we don't because it's mostly a waste of time and you might as well just fucking do it.

So, it is novel that that's the interface that System Initiative presents to you. But the reason we present it that way is even more novel. We present it that way because it turns out that 99% of the difficulty in writing infrastructure as code isn't in expressing a single resource as code. It's in expressing the relationships between the web of those things and how that data informs one another. Expressing those relationships as variables extracted in source code can't react to changes in the real world because it's static data. To make it reactive, I'd have to open it up, parse it, write back to it, and check it in. It just blows your mind when you think about trying to make it reactive.

That's what's incredible about that diagram. If you look at the diagram that is the primary composition interface in System Initiative, it's easy to look at it and think, "Well, it's like a diagram tool." But no, no, no. What that thing's doing is writing the data that then gets transformed into the code that manages the real world and can react to changes in real-time data. That's amazing. And that's why it's cool, because it's not just simplifying an incredibly complex thing for you. The story of many failed technologies is "I'm going to take this incredibly complex thing and simplify it for you," and history shows that often doesn't work. It turns out that most people's complexity is there for a reason—you made those choices for a reason. The tools you use have to be able to handle that complexity and your reasons for choosing it. If they don't, then they're not useful.

Cory: 53:22

So what has been the hardest part of building and designing System Initiative?

Adam: 53:29

So many different things. I mean, you know, it's taken a long time. It's not because I didn't want to ship it, you know what I mean? It's not like we've been sitting here hoping it could take longer. There are a lot of things that were hard, but the hardest has been getting to a place where you understand that the reactive graph is actually the core primitive. Once you realize that, everything winds up on that graph.

When you think about it, okay, it's this big web of reactive functions. How do I manage the functions over time? How do I let you change the simulation in real time? Let's say you've deployed a bunch of stuff, and you have a new security policy. An audit comes through, and they tell you, "Hey, we have to make sure that all our Docker containers only come from an internal registry we control, no more public Docker images." In System Initiative, you can go to the built-in editor, open the Docker asset, and just add a qualification that says all images come from our internal registry. It's just a little TypeScript function. Then you press a button, and it will upgrade all the Docker images in your entire system to reflect that new behavior. You can do it in a change set that shows you that result only for you. If you flipped out of that change set and I had another one open, I wouldn't see your code change that changes the behavior of those Docker images. We would wait until you apply that thing to the main view. That is the canonical reflection of what you say you think you want and the real world at the same time.

Once you hit that button, I have to be able to take that code that merged to that mainline branch that reflects reality, and I have to update every open change because now reality has moved while you're also making changes. That's incredibly hard. And that's true for every property of every configuration variable you set. It's true for sharing across different people. So, if you build new functionality and you want to share it with me, I need to extract it from your graph and then import it into mine. Then I need to track that lineage. So when you publish changes and I've made changes—because I can edit any function at any time, nothing's closed, there's no private things—I have to be able to track all of that for all of it. Making that system work reliably, powerfully, and at speed is no joke.

Cory: 56:25

Yeah, very cool.

Adam: 56:28

And it kind of has to all work or not at all.

Cory: 56:30

Yeah

Adam: 56:31

Because Infrastructure as Code’s really good. So if we can't, if it's not fantastic, then it's not good enough.

Cory: 56:38

It has to be a big enough leap. Can't go backwards. Awesome.

Well, I want to be respectful of your time. I know we're getting close to the top of the hour here. I'd love to know just what advice would you give to current or aspiring DevOps engineers or InfraOps? I made the mistake of opening Twitter this morning.

Adam: 56:56

Oh, no. Why do you have to be like that?

Cory: 56:59

What advice would you give to people that are aspiring to be DevOps engineers or systems administrators or just anybody moving forward in their career?

Adam: 57:10

Oh, being great at your job, being great at this work, is fundamentally about how connected you can be to the teams that you work with and the outcomes that the organization you work for is trying to achieve. Getting great at communicating with other people, working closely with them, and understanding what they need is crucial. It's not just about solving the problem but finding the best path that opens up a solution that's better than anyone could have imagined.

People are often right about the problems they have. They know their leg hurts, they know they stubbed their toe, etc. But they're often wrong about the solution. If I had one piece of career advice, it would be to get comfortable and good at interrogating the connective tissue between the work in front of you and why it matters. If you can get really good and relentless at doing that, your ability to come up with incredible solutions will spike, and that will always serve you well, career-wise.

Cory: 58:37

Awesome. Well, I really appreciate the time. Adam Jacob of System Initiative. You can check them out at systeminit.com and the projects are also open source.

Adam: 58:47

Everything's open source. Every line of it's open source.

Cory: 58:50

So get in there and close some issues.

Adam: 58:53

Wouldn't that be a treat?

Cory: 58:56

Are there any good first issues?

Adam: 58:57

Oh, not yet. I mean, it's a complicated code base, so I'd love to have someone contribute. So I can't wait for you to come. If what you want to do is do that. Like, I'm all in. Come hang out with us in Discord. I got bugs. I'll give them to you. But, I think more of that will come as people use the platform.

Cory: 59:22

Awesome. And where else can people find you online?

Adam: 59:24

You can find me on Twitter. I'm @adamjk. That's where I'm sort of most active, for better or for worse. I'm adam@systeminit.com you can always send me an email. My phone number is in public all the time, and I'm an old person, so I answer the phone, even if I don't know who the number is, which is dumb.

Cory: 59:42

But if you're a sales rep, don't ignore that.

Adam: 59:45

It's not going to help. It's fine. So, if you have questions, you can always call me. And, yeah, I try to look at what we decided to do, both as engineers and as entrepreneurs and as executives, all of those things are hard. And so, I try to be as available as I can be. If people have questions or need help, I try to offer it as much as I can. So, if there's anything you think I can help you with, I'm so happy to do that if ask.

Cory: 01:00:16

That is awesome. Well, thanks so much for coming on the show. I really appreciate it.

Adam: 01:00:20

Yeah, my pleasure.

Outro:: 01:00:27

Thank you for listening to this episode of the Platform Engineering Podcast. Have a topic you would love to learn more about. Let us know at coreyassdriver Cloud. That's C O R Y at massdriver Cloud. Catch you on the next one.

Episode 10

10th Jul 2024

Foundations of The Cloud With Adam Jacob, Chef

Transcript

Listen for free

About the Podcast