Guest Host: Kelsey Hightower - Are CI/CD and GitOps Just Making Things Harder?

What if your production environment had a live, trustworthy blueprint you could zoom in and out of on demand?

Kelsey Hightower guest-hosts a candid conversation with Cory about why CI/CD pipelines and GitOps often break down for cloud infrastructure. They explore a simpler operational model: treat infrastructure as data, lean on clear checkpoints instead of rigid “golden paths,” and make production legible for both developers and ops.

You’ll learn:

Where CI/CD adds friction for infra and what to do instead
Why GitOps works for apps but hits limits for databases, networks, and multi-region realities
How “living diagrams” help new teammates understand prod on day one
Practical guardrails that evolve with your org without locking teams in
Ways to reduce drift, surprise cloud costs, and Day Two chaos
A mindset shift: databases for ops data, not shell-script archaeology

Walk away with concrete patterns to make production understandable, auditable, and easier to change—without more YAML or bigger pipelines.

Guest Host: Kelsey Hightower

Kelsey has worn every hat possible throughout his career in tech and enjoys leadership roles focused on making things happen and shipping software. Prior to his retirement, he was a Distinguished Engineer at Google, where he worked on Google Cloud Platform. He is a strong open source advocate with a focus on building great software as well as great communities around them. He is also an accomplished author and keynote speaker with a knack for demystifying complex topics, doing live demos and enabling others to succeed. When he is not writing code, you can catch him giving technical workshops covering everything from programming to system administration.

Guest: Cory O'Daniel, CEO and Co-Founder of Massdriver and Co-Founder of OpenTofu

Cory has been a software architect and engineer for 20 years, leading up to the founding of MassDriver. He's also a husband and the father of two kids.

Cory O'Daniel, X

Cory O'Daniel, Medium

Links to interesting things from this episode:

Transcript

Cory: 00:00:12

Welcome back to the Platform Engineering Podcast. This is part two of a three part series guest hosted by Kelsey Hightower, interviewing me about what I do during my day job.

Last time we covered my background. The jump from bare metal to cloud and serverless. How choice overload and ClickOps can make governance painful. But how 10 years in, 15 years in, the real bottleneck most orgs feel is in adopting IaC. And it's not the code itself, it's how to pick the right, safe, cost-aware compliance settings.

Massdriver's take: guided interfaces with built-in guardrails that can evolve over time, not static rules. Today we're going to dig into what that looks like in practice and take an honest look at CI/CD and GitOps for infrastructure. Let's get into it.

Kelsey: 00:00:52

And then when you get into this world, you kind of want to avoid overly constraining things. You want to be a little bit more flexible. You kind of want to give in to the illusion of choice.

But I think there's another component to this which is, if you're going to be building out these tools, you have to understand you're never finished. And I think the idea of a golden path had thrown a lot of people off. They're like, "Oh, we're going to figure out the blessed path going forward. We're going to account for all the things right now today, and that's what you're going to use." And it's like, "Yeah, I think checkpoints are a little bit better idea like from. Based on everything we know and the experience we have, this is how you can use these things today. And if one of those don't work for you, we know this should change. And we're not going to make a big deal about it, so if you really want to introduce a new flag or a new region, just help everybody understand because that's the learning part.""We need a new region because... Going forward we want to make that option available because..." And once we all get consensus one time and we go back and update the Terraform to add that, we update Massdriver with the constraints to show that, and all of our monitoring tools to expect that, then we've now learned permanently and we don't have to rehash this debate again.

You have to do that bit of upfront. I think a lot of people have been afraid of that product development loop because they feel, "Oh, this feels a little bit like waterfall. It isn't agile enough or it's too much flexibility. We can't manage this much flexibility." But it's like if you get into this way of working, things like Massdriver should just be checkpoints along the way. So if someone joins the company tomorrow, they should benefit from the last 15 years of you guys learning how to work together.

Cory: 00:02:38

Yeah.

Kelsey: 00:02:39

And then just using the best of the breed that's available. And if they have new suggestions and they can enter the loop as well. Versus rediscovering what's possible every single time someone wants to do something.

Cory: 00:02:49

Yeah. And I mean, that first day as a developer anywhere, is hard, right?

I mean, like, I know a lot of orgs now strive to get you into Prod on day one. I think it's super noble. I know it's hard for a lot of... some people are probably like, "Hell, it takes me like two weeks to set up my laptop when I got here." But like, even if you put something in the Prod day one, there's zero likelihood that you understand Prod, right?

And so like it can feel like you're contributing. It can feel like you're putting in things fairly quickly. But like, Prod is a beast. Prod is a beast, right? Like it's so hard to fully understand the environment.

And that was kind of one of the other key things in what we were thinking about. It's funny like if you go to a whiteboard, and you draw Prod, it does not look anything like Prod. You probably didn't even finish spelling database. You got like "datab...", right? Like we draw in such low fidelity. And it's funny because like Prod is very high fidelity.

Like if you wanted to map Prod out, like if you wanted to really draw Prod out, you're drawing ENIs, which is so... like, it's just so down in the details that like, it shouldn't matter to most of the team, right? And so it's funny, like one of the things that we made pretty key early on when we were designing Massdriver is like we wanted it... and if you see like an early version of it, it looks this way... we wanted it to feel like you were drawing something on a whiteboard. We wanted big boxes. It's like database - well, that's the database. It's the right level of abstraction for me as a software developer. That's Postgres. And it's like you ask the Ops person, what is that? And they're like, "Well, it's a bag of IAM, it's a bag of firewall rules, it's some auto-scaling rules, it's probably some cost rules, it's some alerts, it's a whole bunch of stuff."

Now is that, all of that, the right abstraction for a developer? I don't think so. Because that's a lot of Prod, that's a lot of rules of the business, right? But like going back to the developer, it's like I just think of it as Postgres. I just think of it as Redis.

Oh, you know what I need? I need events. I brew installed or Docker run NATS. I'm done. I did it locally, right? I pop open my Docker compose, I just add NATS there and I'm like, I got NATS. That was so easy. How do I get this in production? Oh, that's not as easy anymore, right?

It's like, that's what we were aiming for. We wanted it to look and feel at the right abstraction level for the developer. So they're seeing this like lower fidelity world while preserving like the high fidelity details of Infrastructure as Code.

And what's interesting is it makes production easier to reason about. And you can kind of like zoom in and zoom out, literally and figuratively. Like I joined day one, I got my thing into production and I'm like, "Oh my gosh, I changed the color of the login button." Nice, like, I feel good about this, but what does that mean? Right?

And what's really interesting is like when you look at a project in your Git repo, like, what does that mean? It went to prod. I don't know what that means. I Look at my GitHub Actions and I'm like, "Oh, it ran this Argo GitHub action and it went somewhere and I think it ran the database thing that's in my main.tf. Cause I see this other GitHub action." And you're like doing this thing where you're kind of clicking through, trying to get an idea of like how the world gets to the way it is.

And in Massdriver it's like, "Oh, I'm gonna go look at my app" and you actually see a diagram. I see my app and I see my app is connected to a Kubernetes cluster and I see my app is connected to a database and both of these things are connected to a VPC, kind of the same way you draw it on a board. And for me it's like, I sit down, I'm like, "Oh, I understand Prod. I understand my visibility layer of Prod." And for the Ops team, they come in and they say, "Okay, I can see the network, I can see all the pieces underneath it. I understand my view of Prod."

And what's cool is when you hop over to another team's project for the first time, you see their app... you see their app here and to the left of it is a Dynamo table, an SQS queue and whatever. I immediately understand all of the cloud requirements of that app. I'm looking at it, I'm not like scrolling through a main.tf and I'm like, "Okay, there's a queue. How do we get the IAM policy for that queue? I don't know where that comes in." So it's much easier to just like reason about the world.

The funny thing is we're very visual when we think about infrastructure. And that might be weird for people that are newer, but like 20 years ago you were actually putting stuff in a data center. This is physical things, right? And then we try to draw these abstractions on a board that's also very physical. And then we're like, that's great. We put the thing in there, we drew the thing in there. We feel very good about how this looks. Let's go skeuomorph this thing into a GitHub action.

And we put it in this layer of fidelity that's hard to reason about. Right?

And so it's not even only just your day one or day zero requirements of like I need to self serve stuff, but it's like... with Infrastructure as Code, like at scale and adoption, how do you reason about it? That's the bigger, harder question. It's not how do I deploy this thing. It's deployed. It's going to be there for three to five years or whatever. And that day two is not a single day. It just keeps going on.

Kelsey: 00:07:56

You know what's wild, Cory?

Everything else when it comes to engineering, if you bought a piece of electronics and the people that were responsible for repairing it said there is no schematics.

Cory: 00:08:08

Nope.

Kelsey: 00:08:08

Yeah, we built it. We have no idea how this logic board is laid out. Good luck. Good luck. You'd be like, that's insane.

If you build a skyscraper and there is no blueprint anywhere.

Cory: 00:08:18

Nope.

Kelsey: 00:08:19

Once we finish building it, we threw the blueprint in the trash, we don't need it anymore. We're doing DevOps. So if you have a problem, just start poking holes in the wall until you find the pipes.

So that's what people have been doing for 30 years in our profession is literally saying, who needs blueprints when you have a hammer? You can just break the walls down. And so we've gotten away with that.

A lot of people have allowed us to be very immature in our discipline for the sake of speed or whatever excuse we've made up. I think what you're saying now is there's been two big benefits.

I think number one, now that the fact that we do finally have an API all the way down to the network card attached to your vm, you have an actual API to tell you about that thing. You have an IP address that also has an object and API. So we finally now have a digital twin representation of a lot of these stacks from end to end.

And so when you think about it, now we have a self generating blueprint if you connect the dots. And so now I think most people have been used to going one way, you kick over the dominoes and the VM shows up.

But since you didn't use a blueprint to create the vm, you're just like, there's a vm, you can SSH into it. It's totally fine.

What we have now is enough data to say you have this VM connected to this network being sent traffic by this load balancer on behalf of this application that is using secrets stored in this path of this thing. So you have enough data. It's just we never got around to putting the graph back together and saying, "This is the state of the world."

So Declarative was great, Promise Theory was great, but we didn't do anything with all the data. And so now I think what you're saying is like, "Folks, you don't even have to do very much anymore other than have a little bit of discipline. You can now have a full live, self updating diagram of everything."

And so there's no reason anymore to start guessing and doing, you know, Mission Impossible level debugging when you literally have all the data to present the graph at all times.

Cory: 00:10:23

It's funny because like, I think it was Liz from Honeycomb posted this... I'll put a note in the show notes and find it so I can link to it... but I'm pretty sure it was Liz posted this thing on LinkedIn about people essentially leaving cloud resources running - like test resources.

That's another thing that we just don't... so Massdriver runs on Massdriver. Like we dogfood the thing and it's painful to dogfood it. The things that deploy themselves as your product is big brain to think about when you're deploying. But we build everything through it. And so it's like there's a cloud resource I forgot about. I can see the bill's a little high and I'm looking at it and like without Massdriver, I'm like, "What uses that Kubernetes cluster?" Or, I mean a Kubernetes cluster is one thing, you can see what apps are in it, but like what's using that SQS queue? Something's putting something in there and I have no idea what it is.

And it's funny, since we use Massdriver, we're like, "Oh, what's this SQS queue?" And I just take the resource name and pop it into Massdriver and I'm like, "Oh, there it is." And I can see that it's the demo app that we use for our sales calls, pushing events into this pipeline. And it's just been sitting there forever, like we just didn't know it was still running. And I just shut down the environment.

That is, I think some of the power that you can get to when you put in this discipline and you accept that like, "I don't know, maybe Git and workflows - like Git is our database for Ops and workflows are our server." Right? Like if you think about what our Infrastructure as Code is.

Massdriver tilts that entire thing on its head - all of your stuff is stored in a database. We can query it. Like you can do things like, "Hey, we just signed an RI. Find every T3 micro that we have and what app is running on it." That's hard to do today. Like, how do you do that in AWS? Find all of your T3 micros. Like day two is hard and it's long and it's tedious and every single part of day two is a surprise, whether it's at 2am or whether it's at the end of a quarter, like it's a surprise.

When you have to go into the console or you have to, you know, write a bajillion shell scripts to kind of get an idea of things because you don't really have anything stored in a database, it becomes much more difficult.

And so I think that's one of the things you can really kind of yield out of just starting to lean into the idea of like maybe the tools that were built on aren't... they were the things that were laying around when DevOps was being born, and maybe they're not the end all for how we should be running these workloads.

Host read ad: 00:12:48

This episode is brought to you by SigNoz.

If you're building distributed systems and struggling to tie together logs, metrics, traces, SigNoz gives you all three in one interface. You get APM style visibility, P99 latency, error rates, external call tracking right out of the box.

You can search logs, filter, build dashboards, set alerts, all inside the same tool. It's open Source, built on OpenTelemetry standards, so you keep portability. Deploy it where you like - self hosted, cloud or hybrid. Teams already use it globally in production.

So if you're tired of gluing observability tools together and want something you can deploy today and scale, go check out SigNoz. io that's S I G N O Z.IO

Just an aside. I'm a big fan of OpenTelemetry. I do something called telemetry driven development. I literally use OTel locally while I do my test driven development. And I have this nightmare Docker compose file that I've cobbled together of all the different tools kind of in the space. I replaced them all recently with SigNoz, so I would definitely check this tool out. It is very, very cool. Looks fantastic. And as I mentioned, you can self host and it's completely open source.

Kelsey: 00:13:55

I have this problem when I open things like spaghetti sauce. If you go to the store and you buy a jar of spaghetti sauce, that thing might expire, let's say a year from now, right? No problem. You put it in your cupboard and it says you have until October two thousand twenty-six. You're like, "I'm good."

So you open the spaghetti sauce and you use two teaspoons of it for whatever you're making and you put in the refrigerator. Now you don't label it. When did you open it? So a month later you're like, "I know this isn't still good till October two thousand twenty-six." That's what it could have been. When did you open it? And so now you have to do this dance of like, "Is this safe to use?" So what do you do? You open the lid and there's nothing growing on the top. It's probably fine. It's probably fine, right?

Or you gotta try to carry in your head, "I opened this last week" so you know that you've opened the spaghetti sauce last week. Now does your wife know that? And if you're like me, if I look at something and I can't recall if it's still good, it's going in the trash. We're not going to the hospital to save $3.28. Not happening.

And so I think people struggle with this because their infrastructure feels like a bunch of opened things that no one has the context of. So it just sits there. In the worst case, you end up just going to go buy another spaghetti sauce and you're afraid to throw that one away, but you don't trust it, so you open another one, and now you have four open spaghetti sauces in your refrigerator.

And I think a lot of people have been carrying this black box mentality for so long that this is why production is so fragile, because it's just sitting there, it works. We don't quite know why and we lose that context over time. So I like what you're proposing.

Cory: 00:15:42

It reminds me also of like... a lot of the benefits and the speed that we see in customers at Massdriver, I think companies can get that by just starting to lean into this idea of a database at the heart of operations. That's like one of the key things that we have. What's wild is like every organization has a ton of data around their operations and it is just scattered to the winds, right?

It's like pre-sheets, like being able to go to like the sheet site and it's just like, "Ooh, I have folders upon folders upon folders of spreadsheets and some of them say final and some of them say edit two." And it's just like, I don't know actually which spreadsheet I'm looking for on my computer. Like, I really wish this was all organized with some sort of versioning in history. And you might be saying, "We have versioning in history because we have GitHub Actions." It's like, "Yeah, yeah, but where's your data?" Are you going to write a script to go through every single Git commit trying to find when that instance type was changed? Then you have to cross-reference and make sure that the GitHub action actually succeeded. Right? Like, it's just, it's so hard.

You have tons of data that is just lost in the most mediocre database and like execution environment possible.

Kelsey: 00:16:51

You know what? People are very afraid to just say, "What is your operational model? Do you even have one?" The answer is probably no. It's probably brute force, but it works though. And the truth is, if you sat back and just say, like, "Is that the best we can do?" The answer would be no. And now you have to get intentional about it.

And I think a lot of people are just really trying to delay that intentionality decision for as long as possible. Because then we have to take a look at our stuff and do a critical rethink. Right? Like, "Oh my God, this is a mess." So now that you've admitted it's a mess, now you got to clean it up. And I think a lot of people would just rather avoid that.

All right, I'm going to spin down this section because the next part is you are pretty vocal on LinkedIn. Anyone that follows Cory on LinkedIn, I think you kind of get to a point where you really do express what some people are thinking. We see these product announcements that are promising the world. You know, the same thing you had before, but the cloud version. Or there's promises of, "You don't even have to worry about these low level details. They'll take care of themselves." And you seem to have like a serious allergic reaction when you start to see tools that try to overpromise the world or you can kind of see through what you would probably consider snickle in these cases.

As someone who's building products in this space, you're looking at all the approaches. I want to go through some of the approaches. You don't have to name companies, you don't have to name specific products. But what approaches do you feel like are a dead end?

There are people who are sitting back saying there's Massdriver, there's tool A, there's tool B, there's the tools we were using 10 years ago. And you really, really feel passionate that some of these approaches are literally a dead end, and they can never meet the challenges we've been talking about for the last hour.

Maybe let's go through the list of the top three what you would consider dead ends. Like either they work for a certain scale, like you can just use them, you got three servers, it doesn't matter what you pick. And then things that maybe if you just push it across like a team at a large company with really serious complexity, it's just a dead end.

What would those top three things be? And maybe let's tackle them one by one just to kind of back up like some of the stuff we see on LinkedIn without the context.

Cory: 00:19:09

Yeah, geez. So this is a hard one. For me, because I honestly don't think too many things are definitive dead ends. And what do I mean by that?

I mean, I've talked to people in the past week that were like, "Oh, Ansible's dead." And it's like, Ansible is definitely not dead.

I mean, you know, you could even look at, like, older tools that have maybe been... I guess maybe feel more retired and still say that they're probably still not dead if somebody has a sophisticated investment in it. Now, where do things die? When you look at an org and you're like, "Operationally, in the cloud, like, we're not super effective." What does super effective mean? Well, whatever those KPIs are for your business - it's too expensive, we have too much downtime. When you start to see these other problems stemming from day two, it's interesting to look back on how you got to day two.

And I think that's where it becomes more of a dead end - is it a dead end for your organization?

Honestly, I think, CI/CD is a dead end for managing infrastructure and GitOps as well. Now, I think GitOps...

Kelsey: 00:20:20

Hold on, hold on, hold on. We got to break this down. Oh, no, we got to break it down one by one. Because think about it, CI/CD for creating software, run your test, make a build, take that release, put it in a repository. It just makes total sense, right? That is the nature of a good pipeline. Very fixed, low adjustments.

Then people take the same type of pipeline technology, whether it's Jenkins or GitHub Actions, and they say, "I know what, I'll add a step to deploy this across five regions, different time zones, and configure all the other things. Why not?" And you're saying, "I don't think that's going to work. That's a dead end."

And I'm pretty sure a lot of people listening - "We just started building our deployment pipelines. What do you mean this is a dead end?"

Cory: 00:21:08

Yeah. Okay, so let's do this.

So what's funny about it is, like, when you look at CI/CD, what is CI/CD? Let me ask you this, what's the D in CI/CD stand for? You saw this a few weeks back, right?

Kelsey: 00:21:23

Yeah, I mean, I think there's a lot of ways to think about it, but I've always thought about it as continuous delivery, but the delivery component was never the last mile to Production.

I had always thought continuous delivery was you wrote an app in Java, you have to compile it with all of its dependencies, maybe run through some tests, and delivery for me stopped around the "It's in a repository." For you that could be a RPM, for some people that could be a Docker container. And so now there's a releasable thing that's sitting in a repository.

Now what you do with it is up to you because it could be destined for staging, it could be destined for Prod, it could be destined for a developer's laptop or a third party service. So that's just where the journey begins. So I always thought about the D part as continuous delivery of an artifact, not the end game for Production.

Cory: 00:22:19

Why weren't you the first comment on that post? This is actually the conversation I wanted to have around CI/CD when I made this post. But what happened was it turned into a shit post.

Everybody was like, the D stands for distraught, destructive, dangerous. Like everybody just brought their baggage. Why did they bring their baggage? Because it's fun to bring baggage to the internet. But also like no one's ever super happy, nobody's like, "Man, this CI/CD stuff was super fantastic." Like Devs weren't even really excited about it when it was just running their tests, right? Like the CI part - do we want to integrate this software? But the rest of the software - Does it test good? Does the wind look good? Does the code look good?

It was already a painful place. And then the delivery did turn into deployment for a lot of organizations. It's like, "Oh, GitHub Actions deploys my thing." It's not putting something into some sort of an artifact registry. And so I think we misused it a bit.

Now, why I don't think it's necessarily great for infrastructure is you lose a lot of details when your Terraform's executing there, right? And you see this solved by many tools.

Some that I think are great, some that are competitors that I've absolutely recommended potential customers of Massdriver go buy instead of us... I'm not, you know, super prescriptive on like you have to use my thing... but we're getting feedback now in a comment. And so the comment's like, "Hey, this is the plan, this is what we're going to do." And you're like, "Great, that looks great." Merge it. And it's like it didn't work. Now I've got something broken in Main - cool, right? And so now it's like, "Okay, well hopefully people aren't cutting things off of Main right now, cutting something broken off of Main."

And it's also this like choke point. If I've got my infrastructure managed and it's being provisioned from a GitHub action, what environment is that? Can I get a preview environment? Can I get a preview environment of a VPC and a Kubernetes cluster? Probably not, that's expensive. Okay, I want to test this thing out and see how it changes. Like is it just a plan? Is that all I'm actually getting in my GitHub action - just the plan?

Or are you going to do an apply and show me like what would have happened? And it's like if you applied it, what was it? Is that a net new Kubernetes cluster? Because that's very different than the one that's running in Prod, right? That may have had some other services deploying things with Argo, right? It's not a real representation of your world.

And so I think that CI/CD as a place to run infrastructure from, provision infrastructure from is just fundamentally not the right fit. I think that D should be delivering some Infrastructure as Code to a registry. And that's how Massdriver works, we're a registry for the Infrastructure as Code modules.

That allows us, as this comes in and becomes data to us, it allows us to do a lot of really interesting things and expose a lot of interesting points for you to kind of hook into as an Operations team. But you're just losing that data in the logs of GitHub Actions or in the logs of Jenkins.

It's funny to think about like the transition from the three pillars of like "observability" monitoring to like OTel and structured data that we're sending out. We all kind of agree like, "Eh, well structured data is a little bit of a pain in the ass." I will take structured logs or traces any day over just a dump of raw text.

But what are we getting in GitHub Actions? Now, you might have some structured output from your Terraform or whatever, but it's structured output that's just to a log someplace that you don't have access to. It's hard to analyze it, it's hard to get information from, it's hard to glean things from it. And you're always kind of like working on either an approximation of production or maybe net new infrastructure or you're merging things that may be broken to main.

So I think it creates a lot more heartache than a system designed for executing Infrastructure as Code. So that's why I think CI/CD is a dead end.

Kelsey: 00:26:10

You know what? So what's dope about that is that a lot of people that are using CI/CD in the way you described, they're shell scripting again. Right? They're just writing a bigger shell script with a bigger scripting execution framework. And it has become their shell and the pipeline has become their script.

And I think the thing that made me super excited about Kubernetes... that a lot of people didn't understand why I was so excited, right? This is someone who used to work at Puppet Labs, used to work at CoreOS, and had been a system administrator years prior... When I saw Kubernetes, it's like, "Oh, I get it now." You finally have a better last mile technology that says, "If you give me what you would like your infrastructure to look like, let me be responsible for doing the deployment part. I will figure out how to get a container on a server. If the server goes away, how to put the container back." So that means your pipeline then becomes even more concrete. The artifact is maybe for some people the Docker image, but for a lot of people it's going to be those manifests. Those manifests become inputs.

And I think some people learned early on it was in your best interest to maybe resolve the manifest. Maybe that meant taking a container ID and put it in a deployment object. But then the object was the final artifact and some people stored it somewhere. And then Kubernetes may watch to say, "Wow, there's a new artifact and now I'm going to resolve the state of the world."

But that only works for one cluster. And maybe that gets us to the next kind of, maybe dead end that you talked about, which is GitOps. Right?

Trying to describe your entire infrastructure using a YAML file delivered via Git also becomes more of a kind of a dead end in your mind. So CI/CD, we talked through how that's kind of a dead end. Trying to use that as a replacement for your shell scripts. But then GitOps falls short in some other places.

What does that look like on the GitOps side? Because you think you're doing something now where you've got all these YAML files describing the world, but then you're still falling short somewhere. Where?

Cory: 00:28:14

Yeah, I'll have to link in the show notes to my... Have you seen my “Gitopscracy” video? Did you see that?

Kelsey: 00:28:20

I did not see the Gitopscracy video.

Cory: 00:28:22

I took the executive scene from Idiocracy and I redubbed it all about GitOps. But it's funny because... and I have a blog post I'm working on on this now.

I've talked about GitOps not being the right solution in the past, but the thing that's interesting with it is developers didn't want it. And I'd venture to say, at your organization right now if you're doing it, your developers didn't want it. Like GitOps is a choice that Ops has made a decision and said, "This is what you're going to do, developers."

Now I have to say, separating the idea of deploying your application from your infrastructure - like I think that it can absolutely make sense for deploying applications. But as far as like managing a database... what do I mean by this, "You made this decision for"?

Well, how I see a lot of teams get to this is the reason they get to GitOps is they're like, "Ooh, I can have some reconciliation." Why do we want reconciliation? Because there was drift. Why was there drift? Because the thing that you gave developers was too hard to use.

Thanks for listening to the Platform Engineering Podcast.

In part three of this series, we'll talk about how CI/CD is not the right tool for the job for managing cloud infrastructure, how Kubernetes is just the last mile and the real limits of GitOps. We'll talk about how you can fix many of the issues by treating infrastructure as data.

And we'll also get into why TicketOps hits a wall and how Massdriver tackles the problem end to end. Don't miss it. Please like and subscribe.

Episode 38

22nd Oct 2025

Guest Host: Kelsey Hightower - Are CI/CD and GitOps Just Making Things Harder?

Transcript

Listen for free

About the Podcast