Episode 2

full
Published on:

14th Feb 2024

What Is Platform Engineering? A Cloud Operation Engineer’s Perspective

For teams ingrained in DevOps practices, Platform Engineering ushers in a broader horizon, a fresh perspective on managing infrastructure. But what does Platform Engineering offer, especially for those adept in cloud operations? In this episode, Cory O’Daniel talks to Chris Hill about platform engineering from the perspective of a cloud operations engineer. From the importance of security and compliance and the challenges faced by developers to the impact of Massdriver to deliver infrastructure management to engineers, Chris shares insights from his decade-long experience. Tune in for an exploration of platform engineering's evolution!

Love the show? Subscribe, rate, review, & share!

Guest: Chris Hill, COO at Massdriver

Transcript
Intro:

You're listening to the Platform Engineering Podcast, your expert guide to the fascinating world of platform engineering. Each episode brings you in-depth interviews with industry experts and professionals who break down the intricacies of platform architecture, cloud operations and DevOps practices. From tool reviews to valuable lessons from real world projects to insights about the best approaches and strategies, you can count on this show to provide you with expert knowledge that will truly elevate your own journey in the world of platform engineering.

Cory:

Hi, I'm Cory O'Daniel, CEO and co-founder of MassDriver.

Chris:

And I'm Chris Hill, COO and co-founder of MassDriver.

Cory:

Today we're going to talk about platform engineering, specifically from the point of view of an operations engineer.

Before we get started, for people that aren't familiar with platform engineering, the canonical definition is designing, building and maintaining the underlying infrastructure and tooling for creating software applications. Overarching goals are to increase developer productivity, standardize development processes and ensure the scalability and reliability of the system as a whole. Chris, what would you say is missing from the canonical definition of platform engineering?

Chris:

I would say security and compliance. In order to properly run all of these applications, if you're going to scale anything, you have to factor in the security element of it. And as these corporations are growing, compliance is going to become more and more of a concern, particularly if you're going to be working in healthcare and you have to deal with HIPAA.

If you're going to be working in EU, California now has these compliance requirements around protecting user data, that if your systems don't have that, you're going to constantly be in a state of chasing after the compliance requirements that you forgot to implement originally.

Cory:

Yeah, I feel like as a developer, it's a lot of focusing on my features, getting something ready for production and then getting caught not thinking of something security or compliance related upfront.

Chris:

Absolutely. I mean, as you're developing a product, a lot of times it's going to be prototyping to begin with, right? So you're going to be creating something, you're going to throw it out there, see if it works.

And then afterwards you're like, “Well, we'll figure that stuff out later.” And unless you have a formalized process around figuring that out, what's going to happen is the next time you guys are doing an audit, the CISO is going to be chasing you down. It's not going to be chasing down the developers, it's going to be chasing down the operations engineer and say, “Why is this configured this way? Why isn't it configured this other way?” And now they're going to be interrupting your job. They're going to be coming in and saying, “Hey, this needs to be changed.”

And you might not even be that aware of the system. You might not be aware of how it was configured, why it was configured that way. And now this is dropped on the top of your work pile of, you have to go figure out how this thing is supposed to be configured.

And you may then have to figure out how you're going to change the system to be able to support the compliance requirements that just happened to show up.

Cory:

Yeah. And it can be difficult as an engineer to like know, I mean, funny thing with compliance is it is pretty standard, right? And if we can standardize that in our platforms, that's great.

But even security, like we have common security tools that we can run on our applications. And if those are part of our CICD pipeline or part of our internal development platform, that takes a lot of that thought out of the developer's hands, lets them focus on features. How have your experiences as a cloud operations engineer influenced your approach to platform engineering?

Chris:

I've been working as an operations engineer for close to a decade at multiple companies. And one of the main things I realized is all of these companies are doing pretty similar things with the same technologies. I mean, the operating in the same public clouds, but they're approaching the problem and they're solving it as if it's a bespoke solution, as if what they're doing is a particularly unique way of doing it.

And you're seeing the same patterns and start realizing like, well, do we need to approach this problem this way? It's very man hour intensive. It's very talent intensive of having to bring people in and treat every single problem as if this is going to be a unique solution that we have to discover and design on our own.

Cory:

Yeah. And in a world where we say like the perfect ratio of operations developers is one to 10, like how does that approach of every single one of these systems need to bring up being a unique or bespoke solution? How does that affect the scale of your operations teams?

Chris:

Well, if you're an organization that has that ratio, I would say you're lucky. Like most of the time, the operations team, if it isn't understaffed, it certainly feels understaffed because you're stuck maintaining these solutions. And look, I get it as a company, when you're an operations engineer, you're removed from the product.

I would say you're two or three layers removed from the thing that's making the company money. And so it's sometimes even hard from an engineering perspective to be able to justify the cost of having this engineering staff that isn't directly contributing to the product. So now you're running an understaffed team, but having to build and maintain all of these bespoke solutions.

And so you have to have some way to be able to scale this work, especially as the developers, the number of developers is growing, but the number of operations engineers is not, at least not at the same rate.

Cory:

And it seems like every organization eventually has a platform. It's just whether or not it was planned and thought through, right? A lot of it is just these bespoke systems that are kind of stitched together.

What skills or knowledge are important in a cloud operations role when working as a platform engineering team, building an internal platform?

Chris:

I think one of the things is just going to be as much experience as you can have with the tools that you're going to be working with. Because if you're going to be building a platform that's going to be sitting between the operations engineers and the developers, which is what you should be, that should be the goal. You have to know what of all of the configuration of these things that we're building, what of this should be configurable?

What should be exposed? What should the developers be able to work with? And a lot of that just comes from just pure experience, knowing how to work with the tools.

I mean, you're going to have to have some form of IAC. You're going to need Terraform, Pulumi, CloudFormation. You're going to need something like that.

You're going to need to understand runtime. You're going to need experience in these infrastructures, code tools. Anytime you're going to be managing infrastructure, you should be doing it with some form of declarative language.

The other piece that's unique to the platform side of it is thinking about how can you view what you're doing and what you're building as a product? Instead of thinking of it as a set of tasks and a set of work orders that's coming down, how can you approach your job like you are building a product that can be used as if the development team, the software team is your customer? What would you provide to them and what would you be building to serve them and their needs?

Cory:

Thinking of it as a product, especially as an operations engineer, it's a bit of a double edged sword, right? Because your organization's looking at it and thinking we have a key product that we're building for our customers. Do we have the time or resources to build a second product for our engineers?

But this is one of those things that if you invest in it, it will yield great returns. These engineers that work on the operations side that tend to do a lot of tasks and feel like they're doing ticket ops, like this is the real value that is surfaceable and understandable in a business. Would you agree?

Chris:

I absolutely agree. The big issue that these companies are going to run into is exactly what you said. Can you build and maintain two different products at the same time?

And you know, you see a lot of these really large companies that they are able to advocate for a lot of the things that they're doing, but they have the large workforce to be able to do it. So you need to find ways with constrained resources to be able to build these products. And sometimes you have to use existing products.

You have to find ways to leverage existing things to be able to make the efforts that you're putting in and multiply them to meet the needs of the developers.

Cory:

Yeah. And like any product in agile software, you can do it in steps. Like you can increment, you can iterate.

It doesn't have to be a platform that you go and build in a corner and then bring to your developers and say, “Here it is.” Like you should really have these developers involved, trying to figure out like, “What can we do today to optimize your development workflows and start to standardize and centralize the management of this infrastructure?” How has your perspective on infrastructure and scalability evolved since leaning more into platform engineering?

Chris:

Whenever I'm working on something new, I started looking at it through the lens of, “What does a developer need to change about this?” Previously working exclusively as, you know, just purely on the operations team, you'll interface with the developers to gather requirements, understand what they're doing, but then you kind of go back and you build it and you maintain it on your own. And none of this really is going to get exposed to developers.

I mean, maybe it depends on your organizational structure and how you guys manage the teams. But a lot of times for me, I just worked on it kind of by myself and I maintained it by myself. And looking at it now and saying, “What of this could be configurable?”

And there's a lot of times that you can build things. I think this is one of the best parts of it is you can build stuff now. And I feel more of a sense of completeness when I'm building something new.

When somebody asks for a new capability in AWS, I go and I research and I implement it. And when it's done, it feels a lot more done because now I'm able to release this and it's able to be used, not just through me, the developers can come and they can deploy it. They can view it.

They can monitor it. They can understand it. And I'm done with that work. Now I can move on to the next thing.

Cory:

A lot of times when I've been handed the thing that I need to deploy on, it's like, this is what I've been given, right? And this idea of platform as a product, it feels easier for me. I must acknowledge that I can give feedback.

I'm like, we need to change the way this thing works to better serve my team or maybe create a set of options here for how I'm running an ML workload versus running some transactional workload. And that can feel very different than ticket ops where somebody's just saying, “Hey, here's, put your thing in a container and run it.” Right?

So remember when we were first getting this idea off the ground, you were actually a very hard sell. What motivated you to shift from cloud operations to platform engineering? And what advice would you give to other operations engineers that are either considering this as an alternative or being tasked with trying to figure out how to scale their operations team with platform engineering?

Chris:

When I first heard of the idea of Massdriver, my concern was I viewed my job as my skill set is too unique, is too specific to be able to be abstracted into a platform that at the time I thought that the platform can do it. And this isn't the idea of we can get rid of operation engineers. Certainly not.

What I realized is that the approach that we were taking with Massdriver was connecting these individual pieces of infrastructure together. And a lot of times that is already how things were being represented and being able to codify the relationships between things and identify the areas where there was a lot of work that was just tedious and repetitive, important, but tedious and repetitive. And I think you see this in every industry, especially with technology is that originally problems get solved with human power.

You just throw as much human capital at the problem as you can. And from there you start realizing, “Okay, what of this is simple? What of this is heuristic and repeatable and deterministic?” And those are the sort of things then you can start automating, you can start simplifying and eventually you can then start shifting it and solving these problems with technology instead of solving it with people.

Probably the operations field in this industry has been ripe for this actually for a little while. I think this is why you're seeing so much interest in this is there's almost this acknowledgement across the industry of there's a lot here that has been made overly complicated and we should start using technology to make these tedious things a lot easier, a lot simpler, a lot faster, a lot more scalable.

And that is the biggest appeal. Platform engineering is what's going to solve that.

Cory:

As a company that dogfoods our own product, how is using Massdriver to deliver infrastructure management to engineers change the way you work?

Chris:

Going back to what I was talking about with feeling complete, like a completeness when I'm done creating what we, you know, what we call our bundles, you know, this piece of infrastructure now that can be redeployed, reused. And it's funny, I'll go in and I'll look at the infrastructure that we have running in Massdriver and I'll be like, “Oh, I didn't know we were running that. I didn't know that's how this thing worked.”

There's almost some degree of separation I have now between the developers and their needs. And I'm focusing more on giving them the tools that they need to be able to do their job instead of the ingredients and mixing it and baking it and presenting it. And there really is self-service that's going on.

I'll go in and I'll see pieces of our system that I didn't know were using pieces of technology that I built. And it's actually really cool to see that, that these things that I'm making, I'm not having to be directly involved in the process. It's like the hours I spent originally creating this have now been multiplied and are being used by the developers to solve problems.

And I don't have to be directly involved in that anymore. I've enabled them to be able to do their job fast.

Cory:

Yeah. And, and that delivery of “Here is something that I've codified my expertise, not only into the bundle, but into how it can connect to the rest of the system.” feels a lot more complete than just handing somebody a bag of IAC where they have to fork it to make changes and the kind of the inputs and output interfaces just get fuzzy over time. And you don't really know what this thing is being used for versus being able to say like, “This has a very specific use case. This is a bucket for ETL, this is a bucket for logging.” And you're able to kind of hide a lot of the unimportant details from that engineer and just expose to them exactly what they need to do to configure something for ETL versus long-term storage.

Chris:

Absolutely. And I know previously you might look at it and say, well, Terraform has done that for a while with modules and yes, but the relationships between that module and the things that use it, those pieces weren't codified in the same way previously. The IAM can simply think, “Is my application a consumer or a producer of content into this S3 bucket? Do I need to read, do I need to write, or do I need to do both?” They can focus on use cases. They can focus on what they need and have the trust that whatever they're deploying is not going to violate compliance requirements.

It's not going to violate the security issues. And so everybody can operate much more freely with confidence. And I think that's what we want.

Cory:

Awesome, Chris. Well, thanks for taking the time to sit down and talk about platform engineering with us. Absolutely.

Enjoyed it.

Outro:

Thank you for listening to this episode of the Platform Engineering Podcast. Have a topic you would love to learn more about? Let us know at cory at massdriver.cloud. That's C-O-R-Y at M-A-S-S-D-R-I-V-E-R dot cloud. Catch you on the next one.

Show artwork for Platform Engineering Podcast

About the Podcast

Platform Engineering Podcast
The Platform Engineering Podcast is a show about the real work of building and running internal platforms — hosted by Cory O’Daniel, longtime infrastructure and software engineer, and CEO/cofounder of Massdriver.

Each episode features candid conversations with the engineers, leads, and builders shaping platform engineering today. Topics range from org structure and team ownership to infrastructure design, developer experience, and the tradeoffs behind every “it depends.”

Cory brings two decades of experience building platforms — and now spends his time thinking about how teams scale infrastructure without creating bottlenecks or burning out ops. This podcast isn’t about trends. It’s about how platform engineering actually works inside real companies.

Whether you're deep into Terraform/OpenTofu modules, building golden paths, or just trying to keep your platform from becoming a dumpster fire — you’ll probably find something useful here.