Foundations of The Cloud With Brian Grant, Kubernetes

Episode Description

It’s episode four of the Platform Engineering Podcast’s special series on the Foundations of The Cloud! This time Cory O'Daniel sits down with Brian Grant, the original lead architect of Kubernetes, to dive deep into the history and evolution of Kubernetes in cloud operations. Brian shares his journey from working in supercomputing to joining Google and helping develop Kubernetes. He also provides insights on the importance of Kubernetes' declarative model, managing complexity in cloud native environments, and the extensive impact and future potential of Kubernetes. Tune in to learn more about the intricate details of platform engineering and the revolutionary developments in cloud native infrastructure.

Episode Transcript

.Welcome to the Platform Engineering Podcast. I'm your host Cory O'Daniel. Today we have a very special guest joining us to discuss Kubernetes’ place in the history of cloud operations. Brian Grant was the original lead architect of Kubernetes and has been a key figure in shaping the project in the broader cloud native ecosystem. Brian, welcome to the show.

Thanks for having me.

Brian's Journey from Supercomputing to Kubernetes

Before we dive into the fascinating world of Kubernetes, I'd love to learn a bit about your background, how you got interested in distributed systems and container orchestration, and just kind of your journey that led up to the work that is releasing Kubernetes.

Yeah, earlier in my career, 30 years ago at this point, I worked in supercomputing. Like I worked on a climate modeling project at the Lawrence Livermore National Lab, for example. And so I had some background in that, and then I'd use some queuing systems, LSF, load leveler, that type of thing in some of the earlier parts of my career. Then I came back to supercomputing and actually did high performance computing on GPUs way too early. And that brought me to Google through an acquisition. There wasn't a need for it at the time, so I had to find a new mission that I undertook and I did that for a while. But then when I was kind of looking for my next project within Google, the container platform Borg was looking to bring in some new senior tech leads to the project. Borg started around 2004, end of 2003, and most of the creators of that project had moved on to other things because that was the way... Google was growing crazy fast. And so I and a few other folks came onto the project and started looking at what needed to be improved. The initial thing I did was related to what I was doing before that, which was making the code more multithreaded so that it could scale with the increasing core counts, more sockets, more cores per socket in the data centers and that kind of thing. And as part of doing that, I analyzed how Borg was being used. Like what request was it receiving, what kind of workloads were running on it, what were people doing with it? And found that it was being used in ways it wasn't really designed to support.

Yes. Love it. We love it when they do that. Right?

After not too long, I started an R&D project called Omega to re-architect Borg. Initially the plan was to extract pieces from Borg and redesign them but, for various reasons, we ended up kind of exploring more broadly. Some of those ideas were eventually factored back into Borg but after about five years of that, cloud was becoming increasingly important as a priority and the focus was shifting in that direction. So 2013 is when that really started to escalate. Level setting, Google Cloud had App Engine for a few years by then and Google Compute Engine, the infrastructure as a service product, had been developed over the preceding few years and was approaching general availability. And Docker emerged at the same time.

Yeah.

So that is how I kind of got into that whole side of things. It just so happened that I spent a few years doing R&D on how to build such a system.

Yeah. And so this is all happening at the same time. Like big clouds exploding, more and more services are getting spun up, containerization is coming onto the scene. The initial release of Kubernetes now ten years ago.

Yeah. And Borg just had its 20th anniversary.

The Decision to go Open Source

Wow. So like, as you're working on Kubernetes, when was the moment where... was it always designed to be open sourced or was there like a very specific point in time where...?

No, actually initially we went through this phase where... Our group was put together with a couple of directors from cloud and from our internal infrastructure teams, it's called the Unified Compute Working Group. And the original charter was to figure out something that was more flexible than App Engine, which was just platform as a service, and lighter weight than virtual machines and compute engine. Something nominally more Borg-like that Google itself could run workloads on eventually, because Google had kind of skipped virtual machines and had never really run workloads on virtual machines. We'd used raw machines and then containers, working on developing Cgroups in the Linux kernel and such. Around the time I started at Google, Cgroups were rolling out within Borg. So we kind of skipped the whole VM wave and there was kind of no going back to that. And Google has a culture of trying to unify all the things. So the idea was, how can we create this flexible container layer as a cloud product? So there were several proposals for different ways we could create cloud products along those lines. So I would say between August and November, and those proposals kind of continued into the spring as well, but really it was just getting started in that August to November time period. And that's also when I was sketching the initial API. So the open source angle didn't really come in. I was actually looking through my notes in prep for this recently, it didn't really come in until like the December timeframe.

Okay.

There were a bunch of reasons for why open source was proposed. And actually, over time, even more and more reasons continue to occur to us about why it made so much sense for us to do that. And December is when GCE reached GA as well, just coincidentally, or maybe not coincidentally. So Google Cloud is just kind of starting its journey at that point in time.

The proposal to make it open source really made sense to all the people who were involved. You know, Docker was open source. Mesos was open source, and that was kind of the leading execution runtime for larger deployments, especially on-prem deployments at the time. It actually had some overlapping history with Borg as well. So all the open source competition was very much on our minds at the time. And also Google had had a history of writing academic kind of papers about MapReduce and Chubby and the Google file system, et cetera, et cetera, et cetera. And open source projects emerged copying those systems. So we talked about Hadoop and getting "hadooped", which, you know, it's like we come out with the idea and someone else builds the popular open source project. Certainly we wanted to avoid that, but we did want to enable our customers to be able to take advantage of whatever we built and run on Google Cloud and reduce the friction for running on Google Cloud. I think we talked about in the Kubernetes documentary the rationale for this but, obviously, at the time we had a close to zero market share because it just launched and we felt the need to really kind of more disrupt the space and shift the entire industry towards containers. Which I think Eric Brewer talked about when we open sourced Kubernetes at Docker Con.

Yeah.

We had a lot of experience with that model and we knew a wide variety of workloads could run really well using containers. And, for the folks involved, we felt strongly that it was definitely the way to go. That is, containers in general and open source specifically. So it took a while to get agreements that that's what we were actually going to do. And that agreement came very late in the entire process. But yeah, it just made sense for so many different reasons.

Google and Open Source Initiatives

 Was this Google's first big open source initiative? Like you said, there was a lot of academic papers.

No, we actually went and talked to a number of the other open source efforts. They're all pretty different. So like Angular was open source.

Oh yeah, that's right.

Chrome and Android were open source and Golang was open source. So Google already had thousands of open source projects, some of which were very widely used and very well known. But for what we were trying to do with the Kubernetes project, they all seemed very, very different in their flavor and how the projects were run and what they were trying to do. Like in terms of building community around the project, a lot of these projects didn't really try to do that in the same way. So we were really treading new ground there, especially as the project grew once we open sourced it.

Yeah, and it seemed like it was pretty quick that you started working with other organizations, CoreOS etcetera being involved.

I think Clayton Coleman at Red Hat started sending PRs within a week or two of open sourcing it. Yeah, I think within a few weeks we had doubled the size of the effort, more or less. And then by February there were dozens of engineers working on it across the community.

The Potential of Kubernetes

Yeah. When was the first point where you realized that it had the potential to become as big as it has become?

You know, I don't know that there was a single point. We were so busy just trying to stay ahead of the competition. We did watch the other open source efforts, the other commercial products very closely, and tried to figure out what users needed, what appealed to them. So I think that was one of the great things about the open source projects, which we predicted. But it did work out that it brought us much closer to users really early on and we could meet their needs much faster, partly due to their direct contributions but also just the open feedback. We took a lot of cues from Docker, I give them a lot of credit in terms of how to run a community. They had an IRC channel, we created an IRC channel. They had regular meetings, we started regular meetings. So we really learned a lot from them in the early days, I would say, in terms of how to engage your community. Later they kind of diverged and we became our own thing with special interest groups and working groups in the steering committee and whatnot, but it was really very informative in those early days.

We took a lot of cues from Docker, I give them a lot of credit in terms of how to run a community.

First year, a little more than a year, Kubernetes reached 1.0 in July 2015, and that was because we set that as a milestone when we created this cloud native computing foundation for the project. So that milestone was set, I don't remember exactly, somewhere like March or April 2015. And we worked hard to hit that milestone, ripping out features and things like that, postponing them in order to hit it. And then we watched the competition. Docker released the kind of Swarm v.2, is how I think about it, at Dockercon 2015. Still no control plane, just a command line tool. So not too worried about that, because we knew from Borg how powerful the control plane could really be. The following year, when Swarm launched a control plane with Swarm mode, that was more of a concern. So we tracked those sorts of things very closely. We worked together with Mesosphere on some areas and to some degree competed for users, but it didn't come up as much. Especially at the beginning. Kubernetes was considered to be smaller scale and Mesos was considered to be large scale, so we're targeting very different kinds of users.

Yeah.

So I'd say by the end of 2017 is when it became really clear that we were doing something right.

Yeah.

Because I think by the end of 2017 Kubernetes was everywhere. Like it was ubiquitous. Amazon had a product, Azure had a product, Red Hat had a product, VMware had a product, everybody had a product.

Was there coordination with the other clouds around their products? Or they just saw the potential in it and they just started running?

There was no real coordination there. I did start the Kubernetes conformance program to mitigate the potential of fragmentation as more and more products were created. That was something I was very concerned about. I wouldn't want someone to fork Kubernetes and extend the feature set in an incompatible way, or to freeze the feature set and make it difficult for us to continue to improve the project.

So we took some lessons learned from some other open source efforts. We looked at how Openstack and Android and some other literal standards approached the conformance problem. Because really, in retrospect, the problem we had is very similar to a standard. Like Wi Fi is the example I use, where if you have a lot of vendors creating Wi Fi routers, and a lot of vendors creating Wi Fi clients... Wi Fi devices that need to connect to the routers, they all need to interoperate seamlessly. So the only way you can do that is by having a really clear, solid, comprehensive spec that both sides can implement and validate that they meet that spec. And that's essentially what we needed with Kubernetes. We wanted an ecosystem of management tools and workloads and everything you could imagine, just work on any Kubernetes clusters to the extent possible. So that informed how we treated the conformance program using a testing based approach. So it wasn't fully specified in terms of all the interfaces and whatnot, but we created a suite of tests that would be executed against a Kubernetes cluster to ensure that it behaved as expected. Kind of end to end tests. And in order to prevent the problem of getting frozen in time we moved the goalposts deliberately. So your conformance is only valid for a bounded amount of time in the duration that the Kubernetes releases were considered supported. And you have to recertify with new releases.

We created a suite of tests that would be executed against a Kubernetes cluster to ensure that it behaved as expected.

Nice. And then we also have the operator model and custom resources. Was that planned as a part of the original API design, to be able to extend how Kubernetes works, but in a Kubernetes native way? Was that part of the original design or is that something that kind of...?

Well, yes and no. So I knew from the experience with Borg and Omega that we needed an extensible concept space, it's what I call it. Because Borg had a very rigid set of concepts that were baked in. It had jobs, tasks... actually tasks wasn't even a real resource in the way you would think in terms of the cloud... So I had jobs, outs, machines, packages and it couldn't really be extended. So what I would see projects do is things like batch job admission and Borg cron job scheduling. And a variety of other systems, Auto scaling, were built as separate systems, kind of bolted on top, and they would pull for information. And if they needed additional information to kind of drive their behavior, they didn't want to create another database or another store that had to store this data, so they asked for ways to extend to the Borg API to put in their configuration effectively. So that was pretty ugly. It worked, but it was pretty ugly. So I wanted a way that we could add these as first class resources effectively.

But in the beginning we were just trying to build something to run containers. So we just had to get that out really quickly and put that together as a sprint to get all that done by 1.0. So it wasn't until after 1.0 that third party resources were created and we had a couple of different kinds of extensibility we needed to support. For example, we had some use cases where we needed to support APIs that were not backed by etcd so we actually created that mechanism first, this is the API aggregation mechanism. And the API aggregation mechanism ended up being used to create custom resources later.

Okay.

But yeah, the initial implementation of third party resources came after 1.0. 1.0 only had replication controller in terms of workload controllers. But we knew already that we needed to have higher level deployment objects, DaemonSets, jobs, cron, et cetera. And that was going to create a bunch more resource types. And we were worried about where to draw the line. So in particular we debated should cron be a built in concept or not? So that is kind of what started to spark the idea of having a first class mechanism. Like we didn't set out, let's build a control plane that can control all the things. That wasn't the main goal, but we did know that there was this large ecosystem that needed to exist for the different kinds of workloads and automation that people would build. Once we put third party resources and custom resource definitions out there, I would say operators were not something that had really come up in terms of building workload specific operators to automate Kafka or to automate a specific kind of database or something like that. That wasn't really the intended purpose. I mean, it's pretty interesting to see what people ended up doing with it or things like crossplane or config connector to configure infrastructure.

Yeah.

We knew we would need to be able to easily extend the API. Exactly how we would do it... You know, I would say we didn't do premature generalization. We generalized it at the point where we thought we were getting close to where we wanted to draw the line. We still added a few built in resource types after creating CRDs. Like in particular, we hadn't implemented binary encoding for CRDs yet, and there were some, like the Lease resource, we were worried about it being not efficient enough if it were just JSON. So we made that a built-in resource type. But at least we had that release valve where we could say, no, let's just at least first implement things with custom resources.

The Declarative Model

You weren't just involved in building Kubernetes itself, you also had a lot of impact on how the tooling to interact with Kubernetes was designed. You worked on Kubectl, Kustomize... and at the same time Kubernetes is coming out is the same year Terraform is coming out. They both have this declarative model of managing resources. What led you to the declarative model? What was the motivation behind that?

Yeah, so within Google, Borg and many other systems are all configured declaratively. Borg uses a language called Borg config, which has ended up being generalized into a more general config language. You know, it's very similar to Terraform in many ways. It's less extensible. So you know, it's built to be specific to Borg and Borg's APIs, but in many other ways it's very similar. It's a domain specific language for generating configuration for Borg. And there was a lot of configuration written in this language, millions and millions and millions of lines of configuration. And it seemed excessively complex and it was hard to extend the set of tooling around that.

So I wanted a clear contract between the system and the tooling to make it easier to build tooling. And in particular, you know, things like auto scaling created special case diff code in the tool that it would just have to know that, oh, this field of this resource is going to change because some other system might tweak it.

Oh wow.

And that's not really a very scalable approach. Terraform has the same problem. Like Terraform is very intolerant of drift of that kind. There are a few principles I had in mind from the beginning. You can see this in PR 1007 in the Kubernetes repo. I'm notorious for remembering the early issue in PR numbers, by the way.

I love it.

Where I kind of wrote out my current thoughts at the time about how it should work. And early on we had this idea that resource representation in the API would be the serialization format for everything. Like you would just be able to literally just serialize the API resources and use that for declarative configuration, use it for disaster recovery, use it for storage and etcd, use it everywhere. That was a brilliant design decision, I have to say. Like that worked awesome. That just worked fantastic. And I still think it's proven over ten years that it was a fantastic decision.

So we knew that had certain consequences, like we needed to have the resources be self describing. So this is why the API version in kind is in the body of every resource, even though it seems redundant with the rest path. It's like so what? It's redundant with the rest path. But that means if you have a representation of that on disk, the client tool can figure out what API to call from that representation without any external information. You don't need a client side provider layer of the kind that's in Terraform. You can actually just have a very simple client, a script even, that just extracts that information and cURLs the API and you're off to the races. So that was very empowering, because you could do things like run sed or some simpler unix tool over your config and then pipe it to cURL or whatever.

Our initial command line tool was called Cloud Config, which was then renamed to Kubeconfig, which was then replaced entirely by Kubectl. And with Kubectl we had a bunch of goals of making the foundation more solid and whatnot. But one of the things I really wanted to set up is support for declarative configuration in the future. A key aspect of that was a model for bulk operations. So instead of the G cloud model of putting the service name or the resource name first and then the verb after, and then the argument specific to that imperative operation, the way Kubectl works is the verb is first and after that come the nouns. And in some cases it could be a specific noun like Kubectl create config map blah blah blah. If you need to do something kind of imperative and very specific. But you can also just pass files, and those files can contain resource types that are heterogeneous. And any number of files and any number of resources. So that pattern reinforced the consistency across different resource types in the API and also paved the way for these bulk operations being able to manage multiple resources all at the same time. Because every use case in Kubernetes, every use case in cloud pretty much involves several resources. Like in kubernetes, a service and a deployment, or ingress service and deployment, or ingress service, deployment, config map, horizontal pod, autoscaler, etcetera.

So Apply we started working on and realized, well, it had some tricky aspects. Sam Ghods from Box wrote the initial Kubectl proposal and worked on that with us and wrote the initial Kubectl code and started figuring out okay, what is it going to take to be able to do Apply? And with Apply I really wanted to solve this problem of special case diff code. We didn't really entirely solve it until server side Apply, but this idea that fields in resources might get populated automatically, either because they're defaults or they're set by a mission control (which came later), or they're set by an asynchronous controller, like autoscaler is the canonical example. We wanted to be able to be tolerant of that. So you could specify a subset of the desired state and have that enforced declaratively, but you didn't necessarily have to control the full desired state. You could interoperate with other automation. So that was a key part of the design because I just saw so many special case hacks in Borg where that was not considered. Whether it's horizontal auto scaling or vertical auto scaling, or setting security settings, or automatically plugging in some application config values or whatever it is. I felt strongly that there was no hard line where you could say, oh, these things are going to be set by automation, and these things clearly will never be set by automation. I just did not believe that anybody would be able to draw that line in an accurate fashion. So I wanted a more general mechanism that could, what Daniel Smith called it is merging intent from both humans and robots, basically.

Yeah. There's two actors on the system, right? I mean that's one of the things that's still, you know, painful with other declarative state models like Terraform. Like something will make a change to the system and now you have drift.

Right. So that eliminated several classes of drift right there, by doing that. Also looking at where we drew the line differently from Terraform, informed by the experience in Borg, I did not want client side orchestration to be required for as much as possible. Like you do an apply on the client side and let the controller sort it out. So you can create pods referencing containers that haven't been pushed to your container registry yet and the Kubelet will give you image pull back off for a while, but it will keep retrying and when the image shows up it will be able to pull it. In general, we wanted the system to be resilient in that way because things go up and down or take time to propagate or get applied out of order or whatever. And I wanted the system to be tolerant of that. So that was a principle we enforced across all the initial controllers. There are some things that didn't entirely respect it later on, but for the most part, yeah, you can just do Kubectl apply and ordering doesn't matter, and timing doesn't matter, you don't need a heavy client side orchestrator, and it just works.

Kustomize

So Kustomize is an interesting one. I'm a Kustomize fan and I feel like Kustomize does not get the love that it should have. And I don't know if it was just that Helm had like gotten such a foothold at the point in time that Kustomize was coming out. But like, what is your take on Kustomize versus other ways of managing resources? And is there anything that you think that maybe the community at large just misunderstood about the Kustomized model?

So with the success of making it relatively easier to build tools and having this clean contract between the clients and the servers, there were lots of tools, like hundreds of tools. Helm definitely became the dominant tool. And I brought Helm into the Kubernetes project. We had another team at Google working with the DeisLabs team on Helm, because we did need that ecosystem of components of off the shelf applications that Docker had, that Ansible had, that Chef had, that Puppet had... that was kind of a proven way to do it. So they were off working on that.

Looking at the complexity of the Helm charts, we had discussed several different templating options in Kubectl, and also features like declaratively specifying multiple files and declaratively generating config maps from application config. And so the original customized KEP mentions, I think, around a half dozen of these kind of longstanding user requests that we were pondering how to best support. So the idea for Kustomize was just that we should just have a kind of a simple out of the box tool for solving those basic declarative configuration needs for people using the Kubernetes project. You know, because people using kubernetes would at least have a development environment and a production environment where they need to redeploy the same new versions of their image to the same deployment, or update their config maps, things like that. So we just wanted a super simple tool for those basic workflows without having to learn Jsonnet or Starlark or some other much heavier weight solution. I didn't want to go the direction of Borg config. I really resisted officially blessing any of those domain specific languages. Kustomize felt like it could be Kubernetes native. We could use the Kubernetes patch mechanism, much like other commands in Kubectl. We could know where the image field is and the deployment resource and could just go change it for the user. We could understand what config maps were and just like Kubectl create configmap imperative command. We could do that declaratively. So it really felt complimentary to Kubectl. And it could solve a bunch of these simple use cases. It wouldn't solve the problem of off the shelf workloads. But if you are just building an app and you want to run it on Kubernetes, it felt like it could solve those common cases.

I really resisted officially blessing any of those domain specific languages. Kustomize felt like it could be Kubernetes native.

So originally, I think it was called Kexpand. My proposal for it was in this really long document that some people called the damn bible, the declarative application management bible. But that document is still in the design doc archive in the Kubernetes repo, or the community repo I think it is, or one of the repos. So that proposal was kind of extracted out of that document, and Jeff Regan built the initial implementation.

I really liked the ergonomics of it. It was simple, it was YAML. The patching aspect was really heavily emphasized, which I think was unfortunate. The built in transforms, like to set the images or the replicas, I think are super useful and more of those could be added potentially if there are super common transforms that people need to do that they want a simple shorthand to specify declaratively to go do that. But that's really what got me thinking about this idea of configuration as data, but actually just manipulating the configuration data.

It still feels like to me, honestly, if you're not grabbing off the shelf stuff that you need tohighly customized, it feels like the easiest way of getting an application, especially if you have multiple environments you have to deal with. It's still my preferred tool. But it's one of those things, I feel like on teams it's like, well, we're using Helm for this and this and this and this, why don't we just suffer through Helm for this? And it's like, well, there's another tool that we could use. It's just a bit easier.

Yeah. And one thing that I found interesting is the recent discussion about the rendered manifest pattern where people are rendering out their configurations, their home charts in advance to understand what is actually generated.

Yeah.

If the things you need to customize are relatively minor, then Kustomize just makes that more obvious.

The Evolution of Kubernetes

You know we're ten years in, a lot has changed over the years in Kubernetes, a lot has changed in the cloud space. With different Kubernetes steering committees, which you're also a founding member of, how have you been able to approach evolving the project while maintaining this original architectural consistency? Like, as more companies are coming on, more organizations.  I've heard that, like, it's even getting interesting now with how we abstract away the compute, but now, like, GPUs have different profiles that we kind of have to take into consideration. So how have you maintained that while also making the project so easy for us to evolve?

Yeah, the steering committee actually came relatively late in the project. We formed the bootstrap committee in 2017. For the folks involved with the project, very early on, we kind of knew who the deciders were amongst the people working on the project and there wasn't really any debate. Then once the project was growing and we started to formalize that into special interest groups, some of that became more formal. We also created a concept of reviewers and approvers. Because Kubernetes mostly started with a kind of a mono repo, we had to subdivide ownership of the repository. This is before GitHub and GitLab had code owners mechanisms. We built our own, inspired by Google's internal owners mechanism. Started distributing ownership of the code base that way, Kubelets, API server, API, etcetera, kind of giving approver rights to the people who had the most domain knowledge about those areas and who'd been leading those areas pretty much since the beginning of the project, in many cases.

Eventually we had to figure out how to bridge the SIG structure of incorporating the community with that reviewer approver structure. So I created this concept called subproject, which would be subprojects of SIGs, where the sub projects would have reviewers and approvers and have ownership over particular parts of the code base. So I think that part worked really well.

When we formed the steering committee, we thought it was very important for the steering committee to be responsible for certain aspects of the governance of the project, but not the technical governance.

Interesting.

So the steering committee is by election, and we did not necessarily want to put the technical decision making power into the hands of an elected body who may or may not have the experience or the expertise with the project.

Yeah, that's interesting.

So I created SIG architecture to hold that power. We had kind of an explicit delegation of that responsibility to SIG architecture. To decide those high level project principles, things like API conventions. We created this production readiness review effort to ensure that new features would be sufficiently observable and reliable and tested and things like that.

The conformance program was put under SIG Architecture's purview as well, to kind of define what is the expected behavior of Kubernetes and maintain that over time. We also developed this KEP process, the Kubernetes enhancement proposals. We had an informal design documentation process since the very early days of the project. As we brought in more and more people that was formalized into this KEP process. So the people who would review those KEPs would be the technical deciders for those areas. So that would give the opportunity for broader review within the project, since there was a process for publishing the KEPs and a period for reviewing them. But, you know, also ensure that things got reviewed before they got added to the codebase.

As it so happens, we also started pushing code into extensions with CRDs and other repos. So we let the SIGs kind of create their own repos in the Kubernetes SIGs organization on GitHub and experiment with things because we didn't want to actually block progress. But if they're going to be in Kubernetes by default, then we wanted to have a higher bar. So I felt like that part of the process worked pretty well. Kustomize started in one of these SIG Cli repo as a totally separate tool before it was brought into Kubectl.

But yeah, I mean, over time the project has matured, has slowed down, has a huge ecosystem around it. So you don't necessarily need to be in the Kubernetes project to create functionality related to Kubernetes in your own project, in your own tool. And sometimes that's even better because as a standalone project, to some degree, you can have your own brand, make your own decisions, have your own community around it. Under the Kubernetes umbrella, the project's so huge, things can get lost or drowned out. Kubernetes releases have always been pretty big, things can definitely get lost in the noise there.

I don't know how many all time contributors there were now, but as of like 2018 or 2019, I think there were around 50,000 all time contributors. Which just seems nuts!

Yeah.

It was kind of nuts, but we had a strategy to try to engage the community through contributorship. And there's always this tension about how do we actually manage the volume? Do we really need more contributors? How can we ensure that it doesn't overwhelm the maintainers? So that was always a delicate balance. But I think at one point something like 12% of users had contributed. Which I felt like was a success.

Yeah.

That that strategy was working. And I think the IKEA effect did build a number of advocates. Even if people just fixed documentation or wrote new documentation. Like I wrote the user guide, documentation is totally necessary. It's totally awesome. That's a valid contribution. I think it keeps the project in touch with the user base as well as vice versa.

It's just a phenomenal amount of contributors, companies, a massive project that's still evolving, is easy to extend. .. I feel like the SIG groups that are created around it, I mean, I think that's one of the more phenomenal things I've seen in open source. Like just the sheer number of people involved, the size of the project and how big it is. And it hasn't ground to a halt in bureaucracy, which is just absolutely impressive. I feel like I'm pretty up to date on Kubernetes and like I'll see a new release come out and I'm like, "Oh my gosh, I've got so much to catch up on. Like so much came out in this new release." Like it's moving at a cadence still that I feel like many open source projects don't move at.

Yeah, well, software has to evolve in order to stay relevant. You mentioned the AI workloads earlier, we deliberately started with just simple stateless workloads on Kubernetes as kind of the biggest segment of workloads it would be easy to get running on a system that would move containers around. But we had a mission, I guess, to bring a more diverse set of workloads on to the platform. So as I mentioned, at 1.0 we knew already that we would have stateful workloads and DaemonSets and batch jobs and cron jobs and so on. At one point we had an effort to get Spark running on Kubernetes. We had efforts to get other popular workloads running on Kubernetes. And AI workloads are just kind of the latest example of that, where we want those workloads to be able to take advantage of the whole ecosystem around Kubernetes. The config tooling, the CI/CD tooling, the monitoring and logging integrations, the networking and storage integrations, the secret manager integrations. Like all those integrations are a lot of work to do, even individually, much less the aggregation of all of those things. Policy enforcement, et cetera.

So I do think there are benefits to running pretty much almost any kind of workload, monolithic workloads, any kind of workload on Kubernetes. So that's why we made decisions like we need to support DNS so that workloads that use DNS to discover instances can run on Kubernetes instead of having to rewrite those workloads to use a bespoke service discovery mechanism or something like that. So I'm hopeful that the community will rally around the best way to make GPU based workloads, for instance, run on Kubernetes. That will just be kind of the latest example of an important workload that has been made to run on it.

AI and ML

Can you just maybe elaborate a little on what is different about what we're seeing in AI and ML workloads that is kind of different from the workloads that we're currently running today on kubernetes? Stateless or maybe even stateful workloads?

Yeah, I think there are multiple aspects of that. One is how the resources are scheduled and partitioned. So at the beginning of Kubernetes, and actually early in Borg as well, the most important resources to subdivide were just cpu and memory. With Cgroups you can carve up cpu, you can carve up memory, the scheduler can look for a node that has enough cpu and memory nominaly to run that workload. GPU's cannot be sliced and diced in the same way. So if you have an inference workload that doesn't need a whole H100 or something like that, then what do you do? That needs to be representable in a way that the scheduler can deal with, where it can kind of see what resources are available semi-statically and understand what the constraints are for choosing where that workload can go. So if there are a bunch of constraints like, you know, you shouldn't put certain types of workloads together, or you can only subdivide with a certain granularity, those kinds of constraints... we didn't really have mechanisms to deal with those things, so we have to figure out how to model those. And they're just issues at kind of the device level and the driver level, figuring out how to partition the devices or to understand what parts are available.

There are some reliability issues in terms of detecting. We have this node problem detector that detects certain types of problems. But if you have to look somewhere else to see if there may be some problem with the accelerator on the machine, then that mechanism needs to be extended to do that. I think the talks at Kubecon EU about dynamic resource allocation and some of these other things that don't entirely mesh with the current model, I think the community will work through that. But it is a new resource, basically, that has different characteristics from the previous resources. Cpu is very fungible, memory is mostly pretty fungible. The accelerator devices are not as fungible.

At the cluster level, training workloads are very different. They're much more like the super competing workloads that I worked on back in the nineties, where basically there are lots of machines running the same program at the same time that need to communicate synchronously.

These are like all or nothing kind of builds, right? They all have to execute successfully or they all kind of fail.

Yeah, so I think discussing what the best way is to address those workloads, that's pretty much a totally different problem from everything I just talked about. It's kind of another topic that would need to be figured out.

Batch workloads had already been discussed, and mechanisms like Hue were being developed to run workloads sequentially if they can't all fit on the cluster at the same time. Because if they're continuously running workloads, you just have to do this footprint allocation. If they are bounded in duration, like batch workloads, then you can schedule them in time instead of just in space. Those are different dimensions in this sort of new kind of category of workload that has different macro characteristics which you may need to just handle differently at that level.

Some people run their clusters differently, even for just stateful workloads versus stateless workloads. Especially if it's not running in the cloud, if it's running on prem or something like that. If there's data gravity involved then they may need to manage the clusters differently. So I think this is another example of where the workload characteristics may change how the clusters are managed.

Unsolved Challenges in the Cloud Native Field

Yeah. Besides AI and GPUs, what other new frontiers or unsolved challenges do you see in the cloud native field?

I think the big challenge that people have been talking about for a while is the complexity. Personally, I think it's great that all of these projects are available, you know, otherwise it wouldn't be feasible for an organization to build their own service mesh and their own policy enforcement engine and their own everything... unless you're at Google scale or something like that.

Yeah.

So having those things available is awesome. Each one does add some operational complexity. So figuring out how to help users mitigate that complexity, I think, is an interesting challenge. And part of it can just be in terms of decision process, like, how do you decide whether you need this tool or that tool or this other tool, and whether it will have a large enough return on investment that is worth incorporating into your stack. But there are probably other things that could be done, especially as Kubernetes matures, to ensure that these things don't have excessive operational burden or introduce too much operational complexity. So I think that may be interesting to look at more holistically than, you know, like on a project by project basis to see if there are common patterns or standards or approaches that could be applied.

Yeah, one of the things I've always found interesting about the complexity problem is like... it's definitely there. Jumping into Kubernetes day one is not like an easy task... we're getting our systems up and we're either home growing something or learning something else. I've always considered the greatest feature of Kubernetes is a portable knowledge set for me, whether I'm working on my local machine or in AWS or GCP or Azure.  I kind of have customers all over the place but I have the same set of skills that I can move whether it's from cloud to cloud or company to company. That's always been the biggest benefit to me, that boon of us all finally being able to agree on how to operate things and that it would work from one place to another.

Yeah, I completely agree with that. A few years ago at Kubecon I gave a talk about ten different ways to think about Kubernetes. One of the ways I think about it is it's like a cloud in a box.

The API was modeled after infrastructure as a service, but it's just kind of operating system level of abstraction instead of a hardware level abstraction. But it's intended to be as equally or almost as flexible as those primitives. There's a compute primitive, which is Pods. There's a load balancing primitive, which is Services. There's an L7 primitive, which is ingressor Gateway. There's a storage primitive, et cetera, et cetera.

The fact that we could create this kind of cloud in a box scenario and it's not something that just runs individual containers, or just one thing or a couple of things like some of the other tools, means that you can have this bigger island of portability across all the different clouds. And maybe if you're just locked in to one cloud and use all their proprietary tools and services, it doesn't affect you as much. But when you go to your next job and they use a different cloud, then it starts to affect you. Or if you can't find the best of breed tool that you are looking for from that cloud, then it may affect you. So I do think that being able to create that ecosystem and that platform that is bigger than any one cloud is a really valuable aspect. If you can figure out how to run a workload on Kubernetes on Google Cloud, you can run one on AWS, you can run one on Azure, you can run one on Digitalocean or wherever. I think that's really valuable.

And from a vendor perspective it seems like it's beneficial so that you can integrate with Kubernetes and now it works everywhere. You might do some additional integrations with individual cloud providers, but, you know, if you can constrain your scope to Kubernetes, at least initially, that's a pretty big addressable market right there.

Yeah, it's massive. And that's why there's so many people in the space. Sorry, a lot of startups listen to podcast. That's why we're all here. Right?

Where do you see the field heading in the next five to ten years? You've been in the space for a very long time. Like where do you think's the next big thing?

Good question. Obviously there's a lot of excitement around AI right now and that introduces some opportunities that are pretty game changing. So I expect that to continue for a while.

Other than that, it'll be interesting to see what happens in the ecosystem, whether some additional standards at the level of Kubernetes emerge or not, in the sense of how widely used they are. Because there is a lot of fragmentation right now. I haven't really seen a lot of like clear distributions emerge, which is a little bit surprising. So maybe that will happen, we'll see some kind of standard distributions of some of these sets of tools on top. It was interesting that Glasskube emerged recently. After ten years, a package manager for managing cluster components has emerged. So yeah, I think that aspect is really interesting. Will we see more standardization of the higher levels on top of Kubernetes? Not in the sense of application platforms, because that's, I think, always going to be fragmented.. but in the sense of some of these other complementary components, whether it's, you know, policy enforcement or GitOps or service mesh maybe.

The Future and Legacy of Kubernetes

You've done a ton of work in the space. Kubernetes has obviously changed a lot for a lot of different engineers, operations folk, clouds. How do you hope that your work in Kubernetes will be remembered and built upon in the coming years?

In terms of building on Kubernetes, I think people have already been doing that, so I feel pretty good about that. Kubernetes has been used for things that I didn't imagine, like Retail Edge. I didn't imagine that Kubernetes would be used in Retail Edge scenarios and be in like every Chick-fil-A restaurant or things like that. Every Target store.

Even if you don't use Kubernetes, I think there are some lessons to learn to from Kubernetes. Like it drew the line in terms of declarative management in a different place compared to other tools. And it just made things just a little bit easier compared to the other infrastructures code tools, for example. The control planes, I think similarly, where we didn't really set out to build a control plane at the beginning per se, but we did have ideas about how to do it and now we see people using the Kubernetes control plane for all kinds of things. In some cases it makes sense, but it may not be... If you were setting out to just build a general purpose control plane that you could use for lots of things, you might do some things a bit differently. We had very specific constraints in mind, like being able to scale down. I thought scaling down would be for more local development purposes, like minikube as opposed to Retail Edge. But the ability to scale down and not have a huge footprint was definitely a design goal. So we didn't want to have several stateful components like a database and a message bus and a cache and maybe something else. We wanted to just be able to run with the one key value store. So that informed the approach we took to the system design. So if you're designing with different constraints, like it's always going to run on cloud and doesn't have to keep up with real time events, things like that, you probably would make different decisions. But I still think there are some lessons learned about what constitutes a control plane and what needs to be part of a control plane that could be taken away from the project. So I think there are some parts of the design that people could take lessons from. There are a bunch of good blog posts and presentations and things about the Kubernetes design. It could be informative if people are looking to build their own systems for a different purpose.

A few years back I maintained the operator framework for Elixir and Erlang, and just being able to go in... reading the docs is one thing, but going in and getting familiar enough with Kubernetes from the ground up or even going through Kelsey's Kubernetes the Hard Way... There is so much interesting stuff in just the way that it was designed and layered together that I think it's just great to be familiar with to apply in other places in your development processes. There's a lot of really interesting ways that things were designed. And the key value store at the center, it seems like a simple choice, but it just empowered so many other ways... like you're saying, like being able to run it at Edge. I know somebody who's putting it on satellites currently. It's kind of everywhere. And it's empowered by these simple, clean primitives that we have at the base.

Yeah, but you know, if you needed to scale it to 100 million instances or something, you probably wouldn't design it that way.

So you kind of have to understand what the design constraints were and what envelope it was intended to fit within. And then you can decide, well, you know, my requirements are different in this way, so I need to make this kind of change to the way the system is designed. One aspect of the design was that we assumed that most of the entities would be updated frequently. So if most of the entities are updated frequently, then the controllers need to have them in memory. So there are things like that. Whereas if, you know, it's like a retail site's product catalog, you'd probably design it differently. It may be interesting as a design case study at some point.

Advice for People Starting in the Industry

Is there any advice that you would give for just people getting into the space, into operations, distributed systems? What advice would you give for somebody there?

Definitely use the managed service if you can. That's the easiest way. Also familiarize yourself with the failure modes so you don't have to debug them when it's critical. I know from the early days of Borg when I was on call, debugging production when production is down is not super fun. It's easier to kind of do it in a more isolated, less pressured scenario. So I think at this point, a lot of the failure modes are pretty well known. And some of them are preventable, but in any case, understanding the things that are likely to go wrong before they kind of do go wrong in a critical situation, I think it's the first thing I would recommend. So I think one of the reasons why Kubernetes the Hard Way resonated with a lot of people is it helped them understand, okay, what are the components? How do they interact? How do the components stay running? What can go wrong with those components?

I was actually looking back through the Kubernetes 1.0 milestone issue list and a lot of the things we were working on because we wanted Kubernetes to be usable in production. So we had run a lot of integration tests and a lot of users were kicking the tires and things like that, but we had cases where dead Docker containers could pile up and they would just continue piling up until the node disk filled up. So those seem kind of trivial, but those basic things like log rotation and cleaning up dead containers and restarting components and things like that, that was a lot of the kind of the productionization work we did leading up to Kubernetes 1.0. So just familiarizing yourself with those kind of basic mechanics, if you're running your own cluster, especially. What can go wrong if this component doesn't restart or this disk fills up, or the container registry becomes inaccessible or some random failure mode like that.

Is that milestone checklist public?

Yeah, I checked. It hasn't been deleted off of GitHub.

Oh, awesome. Okay, I'm going to put that in the show notes. That'll be a fun one to review. Ten years in. Awesome. Well, I really appreciate the time coming on the show today. Thanks so much. Where can people find you online?

I am bgrant0607 on Twitter, LinkedIn, GitHub, and Medium.

Nice. I love it. Uniform. Thanks so much. I really appreciate the time.

Yeah, it was fun. Thanks.

Important Links

Featured Guest

Brian Grant

Original Lead Achitect of Kubernetes