For teams deeply rooted in DevOps practices, the concept of platform engineering opens up new horizons and provides a fresh perspective on managing infrastructure and applications. But what exactly does platform engineering entail? Today, Cory O'Daniel and Dave Williams talk about platform engineering from a developer’s perspective.
In this episode, we're going to talk about platform engineering from the perspective of a developer. The canonical definition of platform engineering is building, designing, and maintaining the infrastructure and tooling for software developers to build software applications. The major goal is increasing developer productivity, standardizing development processes, and maintaining the high scalability and reliability of those systems. What would you say is missing from the canonical definition of platform engineering?
The biggest thing is product mindset. One of the cool things that happened going through YC was that mantra of, “Build things that people want.” Approaching the management of infrastructure and managing all of the requirements of security and compliance as a product that people are compelled to use is an important facet of that move to platform engineering. The difference between that dead loss mindset of that build as we need it is building a thing that people are compelled to use and want to use.
As a developer joining a company that has platform engineering in place, I feel like that's an exciting opportunity to know that you'll truly be able to focus on the programming languages and dependencies you need, and getting the work done that you're excited about knowing that you don't have to deal with the tedium of operations, where you touch on some of these tasks for managing infrastructure.
My career was fun. I came in through microcontrollers and embedded systems. At the beginning of my career, the hardware was so important. Getting into some of these companies that were providing software over the internet, the culture was a lot different. It was about building the product and the thing that people touched. People were very disconnected from the hardware. That gave rise to the DevOps movement. The stuff matters. We need to be able to manage it effectively. The longer that I go being a founder and entrepreneur, I'm back to this point where I need a thing that's going to work and I need it to be modular enough to handle any changes to the market that I need to address.
How did your software development background prepare you for the challenges of platform engineering?
I ended up being a software engineer who straddled the line between operations and software because of that embedded systems background. The hardware mattered. I've always had a product mindset when it came to building automation and things like that. The tooling wasn't there to build the product.
It sounds like the product mindset is one of the key things in developing internal development platforms but what principles or practices that we found particularly relevant or critical even in platform engineering?
The biggest principle is that need to abstract away the hard stuff. Operations are challenging. Security and compliance become even more challenging as we distribute systems. These things are non-negotiable for most businesses. You want to keep people's data safe and make sure that any governing body is going to be okay with the infrastructure that you're supplying. For software engineers, it’s the need to scale. The ability to take action at 1:00 AM when the alarm goes off is important. The biggest thing is that kind of mindset of giving people what they need and only what they need, not bothering them with the complexities of the domain that they're operating in.
It's interesting you think about abstraction because I feel like when we first started using the cloud, the idea was it was this thing that was going to make things simpler than a data center and racking but the cloud has grown to be fairly complicated. You can either look at something as simple as running maybe Postgres on RDS Aurora. There are a ton of parameters that are exposed to you as a developer that may never matter. You have this great quote that you've used around the office, “The goal of the cloud isn't to be simple. It's to be capable.” Can you tell me a bit about that?
If you talk to any operator who came from data centers, they will never tell you that the cloud is more simple. They're always like, “It’s easy to tail the logs on this box and figure out what was going on.” The magic of the cloud is it's risk-free. Not having to buy a bunch of hardware to get your startup off the ground is awesome.
It's there to meet a huge variety of use cases and run it on a massive scale. That fundamentally requires you to open up the system as much as humanly possible. The simplicity has to be on your operations team and developers, making sure that you're finding these standardized ways to run the cloud infrastructure that works for you and making sure that that's easily repeatable.
The simplicity has to be on your operations team and developers, making sure that you're finding standardized ways to run the cloud infrastructure that work for you and making sure that that's easily repeatable.
How has your approach changed when developing internal developer platforms versus more traditional software B2B or eCommerce?
I don't know that it's changed and that's the cool thing. Think about what eCommerce did to commerce. I remember ordering bicycle parts from mail-order catalogs when I was a kid. You call somebody and you’re like, “I need these things.” Thinking about what's going on behind the scenes there, they're probably calling a person in a warehouse, “Do we have these things?” There are credit card processes that take 3 to 4 weeks to get your money.
Finally, they're like, “We got the money. It's time to go ship this.” You're looking at an eight-week process. eCommerce came in and cut this down to a series of calls to automated systems. That's neat. We need the same thing in operations in the cloud. It’s that ability to say, “I need this piece of inventory and have it be a collection of calls.” It comes out the way you expect it.
How do you view the relationship between application development and platform engineering, especially given your unique experience in both fields?
Approaching Massdriver in that product engineering mindset, we created a pluggable system. You can almost think about it like eCommerce for infrastructure where we have a catalog of things. We have a fulfillment mechanism for the cloud infrastructure you're getting. We have a way to audit what has been created, what has been destroyed, and where it's gone. It's the same principles in the same approach as it would be to building anything else.
That idea of calling the bicycle place and looking for the parts you need, I feel like it's very similar when you're developing a new feature and you need a cloud service. I frequently know the cloud service that I need to implement in the future. I need Dynamo versus maybe Redis or a relational database but it's not quite as simple as going to eCommerce sites and pulling it off the shelf.
Even though we have these automated APIs for the cloud, it still feels like I'm trying to get a bunch of different parts. I want Postgres but I need KMS or IAM. I need these other services to be able to get the things that I need. I feel like that's not the way that a lot of developers think or should think. They should be able to think about the dependencies they need, not the dependencies of their dependencies.
We've solved this in the open-source world with packages but we haven't solved this one when it comes to the cloud. What is different about the way that you think about IAC development in this platform engineering world versus how typically somebody might make Terraform or Pulumi modules with regard to all these tertiary services?
Before we started Massdriver, one of the big things I thought about when it came to building a Terraform module or something like that was something being use case specific. It's not like, “I don't need a Dynamo. I need a key-value store for this kind of scale.” That makes it easy to grab a module and make a clean API. Given this is the use case, these are the 3 or 4 fields that you need to be aware of. These are the kinds of things you need to monitor.
When it comes to IAM, you can craft the policies ahead of time, whereas this DynamoDB will have a policy for reading and writing so you're able to split that stuff without having to think about it. When creating something good for a catalog, you think about all those log cases to make sure it's all prepackaged and limited to nothing scope that a user can be confident they're grabbing the right thing at the right time.
When creating something good for a catalog, you make sure it's all pre-packaged and limited enough in scope that a user can be confident they're grabbing the right thing at the right time.
What drew you to platform engineering from software development? What insights can you share with other engineers who are interested in the same path?
The need to move fast makes platform engineering appealing. My experience on both sides as a software engineer, a product engineer, and an operations professional is the ad hoc nature of DevOps where it's like, “I'm involved in this project plan.” I'm blocking people and people are blocking me, going back and forth. That was frustrating. In almost every operation team I've ever seen, there's a lot of weird animosity and a throw-it-over-the-wall culture, despite the fact that that wasn't supposed to happen in the operations world.
For me, it’s that ability to know that I need a piece of infrastructure. I can grab it and know that the company is going to feel confident. It behaves in a way that meets their standards. I can start writing my software and meet the needs of my customers. That's the most important thing that platform engineers could provide.
It's the thing that I feel developers want. You've got 40 hours in your week to build some features and value. Everybody wants to add value. A lot of times as a developer, you can get caught up in all these search area things and negative engineering tasks that you have to do to get to the work that your product manager is working on.
Think about things like redundancy. We have higher uptime requirements than at any time in the history of the internet. Everybody expects everything to be fast and up all the time. Gigantic companies like Amazon and Uber are up all the time and handling massive scale. I want to be able to write software quickly that can handle that kind of scale and can be that redundant. Getting buried in the nitty-gritty of Kafka and all that stuff can take a long time with someone who has that kind of brain and isn't focused on that solely.
Everybody expects everything to be fast and up all the time because of gigantic companies like Amazon, Uber, and all these other companies that are just up all the time and handling massive scale.
As a company dogfooding products, how has using Massdriver to manage infrastructure changed your team's relationship with cloud operations and infrastructure?
I've lived through a lot of errors in software development on the internet. I've had the server room in the air-conditioned room and the office. I've lived through the early days of the cloud and one of the most interesting times, which was the beginning of the Heroku period when they started to lean into the Rails deployment environment as a service thing. That was magical at a time when our software was limited in scope. It’s the idea that a technology like Rails could be your security layer, caching layer, or everything layer.
As we started to realize, this wasn't a thing that could exist anymore. We needed to start distributing our applications. Heroku fell apart and we went into this DevOps world. We were writing a lot of infrastructure codes, scripts, and things like that. It honestly started to feel worse and worse. The cool thing about platform engineering and dogfooding Massdriver is it feels a lot more like the Heroku days where it's like, “I need to run an application. These are the six Lego blocks that I need to do very specific functions.” It's compelling as a software engineer
From a team's point of view, it becomes easy to start to see the organization-wide patterns. Everybody is queuing the same way because it's easy. It's just like grabbing some Lego blocks. With IAC, it's difficult to go with Terraform. It's like, “What combination of modules and incantations do I need to get this thing that I need?” It's given people a better view of the infrastructure and the patterns while simultaneously being so much more accessible to engineers.
Dave, thanks for coming and talking to us about platform engineering.
It’s my pleasure.