Building Real-World Platforms: Abby Bangser on CNCF, Kratix, & Syntasso

Episode Description

When organizations grow beyond using third-party platforms, they face a critical challenge: how to build internal platforms that enable teams to work efficiently while maintaining security and compliance. Abby Bangser, founding principal engineer at Syntasso, shares insights on creating real-world platforms that strike the right balance between standardization and flexibility.

Key Insights

  • The shift from external platforms to internal ones often comes from specific business needs, like compliance requirements
  • Successful platform engineering requires finding the right balance between prescriptive standards and flexible customization
  • Platforms should offer multiple levels of abstraction - from simplified "paved paths" to advanced customization options
  • Platform teams should watch how users interact with their services to identify emerging patterns and needs

Episode Transcript

Welcome back to the Platform Engineering Podcast. Today on the show, I'm joined by someone who's doing a ton of work in the platform engineering space, Abby Bangser. Abby is the founding principal engineer at Syntasso, like a lasso, where she's working on Kratix and open source framework that helps organizations build custom platforms with just the right balance of opinionated structure and flexibility.

She's also deeply involved with the CNCF, not just as an ambassador, but as a co-lead in the CNCF platform working group, where she's been helping shape how the industry thinks about platform engineering at scale. She's had a fascinating career as a developer, SRE, infrastructure engineer, QA. You done anything else? I feel like you got all of them right here. 

Bingo card.

Full stack of everything. She knows it all… broad perspective. Super happy to have you on the show today, Abby, thanks for coming.

Yeah, thanks for having me, Cory. I'm glad. I feel like we've been unable to connect, so I'm so glad to actually be here, be able to chat with you and talk all things platforms.

Yeah, I know it was funny. What was it in? I can't remember which KubeCon it was, but it was just like, we finally like flew by each other for a moment. Was it the…

Wave as we walk down the hallways.

Yeah, I can't remember if it was EU. I was like, “Oh, hey, look… it's real life.” Yeah, it was very exciting. 

Career Background And Transition To Platform Engineering

Could you just tell us like a little bit about your background and you know, what got you into platform engineering just to start us off?

Yeah, I think that you said I have the bingo card of different roles around the industry and it's probably because I actually came in from a bit of a sideways move. So I wasn't a comp sci degree. I wasn't in any sort of engineering coming out of university but I got really interested in the automation side of things. I was in an investment firm learning how to kind of generate data and clean data and evaluate the quality of data through automation. 

I had a friend at ThoughtWorks who was like, “We teach people how to write code professionally. Come join.” And so I joined the graduate program at ThoughtWorks as a career changer. And I think that background coming from different roles and different educational background meant that I actually found a lot of different homes within software delivery. 

So QA was my first home for a very long time in a lot of different ways. DevOps and SRE and platform engineering were really when I started to take that quality mindset and grow with it beyond “Does this work on my machine? Does this feature do what was written in the spec?” and do more of, “Does this work like our users need and want it to in their hands?” And I started thinking about that system quality and that delivery quality and that kind of thing. 

Yeah, so I've been in internal support and internal teams for my entire career actually in software, which is quite fun.

That is awesome. So let me ask you, because I feel like now hearing your background, we have a similar path into software. So I'm going to ask you, this is me reflecting on my past… I'm just curious if you had something similar for any folks that are switching careers. I love people that do career switches. I think, honestly, some of the best engineers I know were in a different industry and then switched. 

Some of the best engineers I know were in a different industry and then switched.

Did you have that moment where you're like, “What in the fuck am I doing? Like, why did I switch careers?

I mean, yes. A big part of the early part of my career was me like trying to figure out how does software actually do the thing. Like, “Okay, so I get that, like you say the software is behind websites, but what do you mean it's like changing things and calculating things and drawing things?” and just getting my head around a bunch of that was like, “What is happening?” And it was really interesting for me, the like, “Why am I here?” situation. 

The question mark there was like, I came in through the QA route, which has a very interesting background. I think some of the smartest people I've ever met and most thoughtful and most technical people I've ever met come from that QA background. But as an industry, there is definitely perceptions around QA, right? Because there's quite a wide range of humans in QA. And there's some that are in more traditional roles of kind of manual QA that follow instructions. And there's some that do manual through exploratory. And then there's some with automation and all these things. 

So I really felt like, where am I? Did I come into the wrong place? Am I going to be able to be successful here? What does this look like? And the QA community at ThoughtWorks was thankfully very diverse and also very supportive and brilliant. And you had examples of people in all those different styles. So I kind of found my home there and figured out who I was within QA and then within technology.

Yeah, that's funny. I worked for… I think it's been long enough I can say this... I worked for Raymond James like quite a long time ago. And I remember when I first met the QA team, we're like talking through some stuff we're automating. And he's like, “You can't really automate all QA.” And I'm like, “I think we can automate a lot of like what you guys do.” And he's like, “Well, how would you automate…” and he was not joking at all, he's like, “How would you automate the elbow test?” And I was like, “What is the elbow test?” And he's like, “It’s were you just put your elbow on the keyboard.” And I was like, “What the fuck are you talking about? This is the strangest thing I've ever heard.”

I mean, you don't know, he might have invented chaos engineering. You don't know, man. Like, Cory, like that could have been pre-Netflix, pre-chaos monkey, that was it.

Yeah, but it was very funny. He's like, “Whenever I see something where like the software feels a bit shitty, I just hit my elbow on the keyboard to see if it crashes. And the amount of times that like something will break just from like a bunch of inputs going in at once is…” I mean, this was like old banking software. And it just stuck with me, I just do it every once in a while. It is funny, it will just crash a form every once in a while. That's funny.

Understanding Platform Engineering And Kratix

Okay. So let's start with Syn…?

Syntasso

Syntasso, like lasso, sorry. 

So what's the philosophy behind how you all are thinking about platform engineering? Because I think your approach right now with the open source component is fairly interesting, but we'd love to hear it from you.

Yeah, thanks. So the open source component is this framework called Kratix. And what Kratix focuses on is this idea that internal platforms are internal because they're specific to an organization. If you could completely buy your processes and your provisioning and your infrastructure and all that from a third party, we are the first ones to say, “Do it.”

I've worked at organizations, I've been at other startups that are small enough that we outsourced to Vercel. We outsourced to the cloud providers. We didn't have the type of compliance and the type of business processes built up that needed customization internally to any significant degree. But it's funny because even that startup I worked in… where we had, by the time I left, I think 80 engineers or so, and when I started it was about 10 engineers… in that time of scaling, we already hit the point where we couldn't be on Vercel anymore.

And let me tell you, as the person who took the team off of Vercel, I think it was good I left that company so they could hate me forever. They adored being on Versel. They found that to be a fantastic user experience, developer experience. But the problem is that we had a compliance requirement that we no longer could control because we depended on a third party that didn't meet our needs. 

We had to be PCI compliant. Vercel is itself PCI compliant, but it can't prove PCI compliance for any apps running on it. So it as a system is, but it's apps or not. And so we had to move off it in order to meet our expectations from our customers. That's the moment where we had to go to internal platform instead of platform as a service external. 

So with Kratix, we believe that when you are at that stage where you need an internal platform, you need things that are custom to your business, things that allow you to set standards, allow you to apply processes, things like that. You need a way to do that easily and to bring all those experts together to collaborate and democratize that experience. And that's what the Kratix framework does. It enables you to take your compute experts, your storage experts, but also your security experts and your financial experts and all of those things, bring them together and create services that are on demand in your business.

Balancing Prescriptiveness And Flexibility

Yeah, so one of the things that I've worked with people a lot is like… so it sounds like really what doing is you're kind of like designing your own abstractions internally for these services. And one of the things that's hard, I think, with abstractions is like getting the right balance of like prescriptiveness and flexibility. Like how do you all think about that? Like how do you give those developers what they need without making it too rigid?

Yeah, so this is where we actually see very different opinions with our different customers. With a promise, you can give whatever level of prescriptiveness or specificity that you want. And so it's one of the things that we work the most with customers on, helping them figure out what's right for them. 

What we find as the most common solution is people create one version of the service with all the bells and whistles that are available to them. So if you think of, let's say, a bucket on the cloud. Something a lot of people use and a lot of people understand, there might be certain things that you never want a user in your organization to be able to manage themselves. For example, you might always want encryption. You might always want backups. You might always want versioning, et cetera. So you never expose those as optional to the users. But there's lots of other things that can be optional to the users. They don't break any of your rules as an organization. And so you make one API that's got all the things. And so that way, you never become a blocker for someone who's like, “But can you expose this one more field?”

That's for the advanced user case. So to support the average user case, you then make a second level API that's sort of the paved path. The like, “We got you, you just want a storage bucket. That's cool. Give us like a couple of fields worth of information. We'll take care of the rest.” 

Those APIs actually stack on top of each other. So they all get managed from the same backend system, it’s just the paved path version of the promise has a whole bunch of platform opinions built into it. And the lower level promise only has the mandatory opinions built into it, not any of the optional ones. And that way you can sort of opt in to the level of abstraction that's right for you. And you can grow into the more advanced option. If you want to start paved path and as your experience becomes more mature and you have more needs, you can always switch over to that more complex API in the future.

I'm going to use a very technical term and I hope everybody's familiar with it - that is sick! 

Now I know I'm talking to a California guy. There you go.

Yeah, sorry about that. So in that paved path scenario, you’ve got the baseline API where like you're hard requirements are and then there's like the… I don't know if novice API is right, but it's like the easier path, right? Can you make multiple of those for the same service? So if I say like “Hey, these are baseline… like these are our rules for S3, but if you need a bucket for logs versus a bucket for like end user generated content…

A static website or whatever…looked different… Yep, absolutely. 

At the end of the day, what these are in the world of Kratix promises is they are kind of easier to create CRD and controller pairings. And so when you create the lowest level, it's a CRD with potentially hundreds of fields that makes it very kind of complex to fill in for someone who's not familiar. 

Now, when you make a CRD controller pairing that can just create a request… a custom resource type of that lower level. So you create your lowest level S3 complete promise. And then you create your next level promise, like S3 website promise. And that has a CRD or a custom resource definition that has maybe three or four fields in it, but it hard codes that you're going to have the website box ticked. It hard codes getting DNS set up and things like that. And then it just makes a request with those decisions that are hard-coded plus the ones the users gave to that lower level S3 complete.

So yeah, any number of promises can speak to each other and build in those easier pave paths.

That is very cool. You've been in this space for quite a while. You're helping shape like the messaging around the space and the CNCF. What is the hardest part about building platforms in practice, like technically or non-technically?

Yeah, so I think the hardest part when you say, “Oh, you're around the messaging.”, I think it's about vocabulary and consistency of vocabulary. Using words and actually confirming that people mean the same thing when they say the same thing. I think that's from like the messaging side. 

From the actual execution side, I think that the hard… the thing I'm seeing happen a lot right now is we're pendulum swinging. We pendulum swung from Dev and Ops – so centralized operations that control everything – to DevOps - where you have lot of autonomy and control in your organization. We are pendulum swinging back towards centralization. That is platform engineering is centralization of your services, your internal services. And I think the hardest part right now about building a platform that is centralized, but that is not going to bring us back to the negativity and the challenges that drove DevOps becoming popular, is making sure you don't seclude and only have a single team who can build that platform. Or a single group that becomes a bottleneck for those centralized services. So building it truly as a platform that everybody can integrate into and support and add to versus something that is locked down and hard to get extended and all of that.

Yeah, I think a lot of folks are struggling with that. Like one of the things that I've been… I feel like frustrated with for a fairly long time is… Look at the Squirrel. Special guest the Squirrel for the people on video. Look at him. He's very sus about what's going on in here. He's like, “Is this guy talking about DevOps again?” Don't come in here.

It’s one of those black squirrels as well. Looks dark I can't tell if that's just from the window… from the shade.

These are aggressive squirrels here. There's a non-zero chance it just comes in here and runs me out. And so this will be hosted by a squirrel. So I hope you speak squirrel. 

So one of the things that's probably been very frustrating to me… do you see his head? Look at this guy. Okay, sorry. I'm like legit scared of these guys. They're aggressive. Like I'll be walking through the backyard. They're barking at me all the time. These squirrels are no joke. Um, okay. So we'll see if Drew keeps that in – I think it's great.

Um, so one of the things that's always been like a bit frustrating to me is how we lost the thread on DevOps and the definition of DevOps. And I feel like it's gone a lot of ways. And when I first started hearing the term platform engineering starting to be used a lot, I saw an opportunity here, because I feel like we never got the marketing around DevOps right. And I feel like that's why it kind of spewed into like three or four or five or 25 different definitions. 

I saw this as an opportunity for us to get this right. This gives us as operations engineers, software engineers, DevOps engineers, the ability to come back and say, “Hey, people are excited about this idea. Let's talk about it the right way now and actually see if we can start to roll this stuff out in organizations like all kind of agreeing on what we mean by this.” And I feel like we've started to get to that point where the term platform engineer has started to lose its bearing. And so I've met people that are very much lodged in ticket ops, but they're called the platform engineer. And I'm not sure…Have you seen this yet ? Where it's like…

Absolutely.

It's almost like a rebranding of the title. 

Evolution Of Platform Engineering From Devops

Like, what are your thoughts there? Like, what can we do as an industry to make sure that we don't like… I don't want to say squander the opportunity that we have, but we do have an opportunity to really, I think, revitalize the way that we've looked at operations, especially like in a cloud native world. 

I'm always worried that we're going to just kind of backslide. And it's like, “What's that group over there?”, “They're the platform engineering group.”, “ What do they do?”, “They write bash scripts for us.”...and it's just like, “Agh, why are we here again?”

Yeah, I mean, at the end of the day, we are pendulum swinging back to centralized. Like I said that, and I don't think that's a bad word or a bad situation to be in, but it's happening. 

And so, you know, given that I definitely have… I know of organizations, I've worked in organizations, as like a consultant and around, that had never really left centralized Ops. So in some ways it's like they never really quite made the move towards DevOps, and now it's kind of sort of back to centralized, and they're like, “See, you're all back over here. Told you that DevOps thing was no good.”

Welcome back.

Yeah. I don't know. So the first thing I would say is that I try very hard not to judge or like talk down to someone who has a different opinion on these things. Like words are hard, especially when they are related to like in-crowd, out-crowd, and your self-worth and your salary and all these things. So I can never fault someone for like, “Wait, you're telling me if you change my name from Ops Engineer, DevOps engineer, I get like a 20K pay bump? Like, sure.” 

Yeah, get it. All day long.

Like, why are you going to fault somebody for doing that? 

No, get that cheddar. Just to be clear, get that cheddar, please.

Exactly. So first of all, yes, I see what you're saying. It makes me sad too, but it's also like a part of the hype cycles of all these words. 

We saw it with… coming away from our space, we saw it with observability versus monitoring. It gets particularly bad when you get vendors who have been around for many, many years and they sort of go, “Wait a second, don't run away from me now.” and sort of come in and kind of grab that new word and maybe muddy the waters a little bit. I'm not sure again… I think we were talking before… I'm not sure it's malicious. I just think the impact is confusion, right? And that's frustrating. 

Language is hard. It's always changing on us. Like probably a fair number of people go, “That's just what they're calling it now.” 

Yeah, yeah, yeah. It's like, “I've been doing that for years. I've been automating my tasks when someone requests something from Jira for years.” And it's like, cool – That might be called Infrastructure as Code. That might be called efficiency gains. But if someone still has to make a request to you, and if you're on holiday, they don't get it back until later. That's not a platform service. That's not an on-demand self-service solution.

If you hand them your implementation… so you hand them Terraform or Helm or whatever, Crossplane, et cetera, like all the Infrastructure as Code tools… that isn't a service, that is a template. And with templates, they have to do maintenance. That's your puppy for Christmas. How are you going to now keep track of this puppy? How are you going to raise it nice? Don't jump. Don't bark, all these things.

Yeah.

So what we're doing right now and what I'm perpetually shouting about is just trying to get people who are trying these new ways of working really thinking about the principles of platform engineering around self-service, around business specific requirements, around fleet management. So instead of that like template and one off fixing, being able to manage your entire estate. Those kinds of principles. Trying to get people to speak about what they're doing there because it starts to open people's imaginations, I think.

And so for a lot of people, it's just they go, “Well, you tell me I need automation. I'm doing automation. Look at all of my Infrastructure as Code.” And it's hard to do that nuance, which is a challenge.

Yeah, I think with the pendulum swing, the thing that's really interesting is it's been necessary for a while, right? Like when you think back to the origins of DevOps, like our world was very different back then. Like the cloud actually meant something. I feel like the cloud has like lost it… people started saying hyperscaler now instead of the cloud because when people say cloud, it's like, are they talking about SaaS or are they talking about Amazon…

Private, public?

Yeah, it's like, it's very vague. It's like, you know, “Oh, the cloud” and you might totally be talking about Notion or something like that, right? Because people just kind of bucketed everything together.

iCloud.

Yeah, right. That's a cloud too. I use the cloud all the time. 

But you know, like when the word was coined, our operations were very different. We were very much still in a VM oriented world, right? And we've gone through a number of phases – serverless phase and container phase, container orchestration phase, serverless containers, now there's this new shift to, know, AI systems and whatnot. The scope of the cloud has gone from something that back in my LAMP stack days, where I could hold the entire thing in my head, to my product might use 27 different AWS services. And that's a lot to hold in a single engineer's head. 

I think one of the things that's always been exciting to me about the recentralization, but making it accessible through APIs, is those walls that we tore down in DevOps had a meaning and that was your expertise was in that silo. And what happened when we tore down these walls is people got access to stuff, but they didn't get access to your tribal knowledge or to any of your expertise. And now it's on everybody to be a DBA. It's on everybody to be a bucket security expert, an IAM expert.

And they didn't lose their other requirements. They didn't have to stop being an expert in user experience design and interface design and whatever the software application languages they're using, et cetera. 

I think that's the thing I try to be really particular about when I talk about DevOps. I am not in the camp of DevOps is dead. I'm not in the anti DevOps camp. I'm very, very pro DevOps, like as a set of principles, as a way of working if you are correctly scoped. 

Yeah.

Every one of these terms, it's about nuance. The nuance between platform engineering versus centralized ops is about self-service. It's like, it's a nuance. You can very easily be like, “I'm doing a platform. I have centralized requests that come in through JIRA.” And it's like, that's not hitting that nuance of self-service. 

Same thing with DevOps, like the thing that makes DevOps great is that you can be self-sufficient and autonomous in the area where you have the abilities and the interest and the capabilities to manage. And so where DevOps thrives is when application teams develop and operate their software applications. And they depend on a number of things that are developed with a DevOps mindset. So those dependencies that they have are also developed and operated by other engineers who are perpetually paying attention to the user patterns and the user needs and releasing new updates and managing the user experience and all those things - but at the levels of those dependencies. So they're applying DevOps principles to their services, which just happen to be dependencies for the software application developers.

But it's that nuance. DevOps is great if you don't have to be the owner of everything from CSS down to your VM. That's hard.

Yeah.

Let's shorten that stack and make it achievable.

It's little tough. It's funny, like what you were saying there just a second ago about developers like developing and operating, and then there's this other thing where developers are developing and operating that they interface for me, right? One of the things I've always found interesting is, you know, when talking about designing abstractions – whether that's a Terraform abstraction or an API abstraction – this notion that, what if the developers that are using it have a different use case and they have to change it? 

I've almost seen this as like a concern, especially from people that are like a more operations background, where they're not as interested in introducing change to a system, they're interested in the stability of the system, right? 

Stability, yep.

And they're like, “Okay, well, what happens if somebody comes along and they do need this to work different than what I've developed?” My opinion of that is that's great. That's fantastic. You've found the exciting part of DevOps. You're going to collaborate. You're going to talk about the system and start to get some of that system design. 

I think, traditionally, when you don't have a platform team where you have a lot of people that have been working on system design and building software, that's a hard leap – to go from, I've been working in operations to we are a platform team. We don't have traditional software experience. Like, “How do I map… I don't know about the principles that I should know about to do that.”

And I feel like that's one of the things that's always the hard part for some folks to get. It’s like, “Well, we already developed this once, why would I make changes to it?” It's like, “Well, because the software changes, it's always changing.” I love the way that you kind of position that. 

I'd be curious on that, for your experience actually, Cory, because I have a lot of thoughts around like inner sourcing and the ability to extend platforms and extend software. If you depend on software, how much can you get involved? Everyone's like, “Inner sourcing will solve all our problems.” Is that true in your experience? Like I'm doing a bit of a poll around people I know to see what their experiences are with that.

When seeing teams like that, I love that. Let's say we built our platform – whether it's, you know, whether it's a Kratix promise or like how we do it with bundles – like there's a subtraction that's been designed. And one of the things that we kind of say to our customers (sorry, not to talk too much about us) is we want your engineers that are using the platform to find places where it doesn't work for them anymore, because that's interesting. Like you’ve found somebody whose use case doesn't map to what you had and that's good, that is product. That is revisiting product, iterating on a product, that is a thing that you want. 

Revisiting product, iterating on a product, that is a thing that you want.

Now, should that person have to fix it? I don't think so. And I don't think that they should necessarily have to know how it works under the hood. I should be able to open an issue on maybe the repo that has that module inside of it and say, “Hey, this doesn't work for me.” This is me putting in, effectively, a support request. I'm saying, “Hey, the use case has been presented to me doesn't work for me.” And I think that that is an absolute valid path to take. 

Now, if on that same team, somebody's like, “Hey, this use case doesn't work for me.” and that person says, “You know what? Like I've always been a little bit interested in Terraform or OpenTofu or Ansible or whatever the tool is and I've got the time. Like I'm going to open a PR and not just open an issue.” I'm going to say, “Hey, like this would help me.” That's valid too. 

There's just two different people who have two different ways about going about problems. And like the reality is like, if you say, “Hey, no one from outside of this team is allowed to open a PR.” that just, that sucks. And if you're saying, “Hey, if you have problems, it's your problem.” that also sucks, right? And so I think letting people kind of express their differences in the use case and how they need it, I think is how you get to a system that works well. And it's like, you might not take my PR and you can't be offended if I don't, but you also have to give me good feedback as to why you didn't. 

And now we're in an interesting spot. I have something that works for me. We don't want it in the use case. Okay, do we make a separate Kratix promise for this particular use case and say this is a logging bucket versus a UGC bucket, right? 

That's great. Because now other people may have just been saying, “Hey, you know what? I need a bucket for logs. There was a bucket module there. I just used it. It didn't really work the way I want it to.” Now people are like, “There is one for logs now? That's great.” That's a good place to be.

So I think it needs a mix. I think it really comes down to the team dynamics and the personalities of those teams. If you don't have engineers that have the breathing room, that are working on the product, you can't expect them to grind to a halt and fix the problems themselves. They're there to pay your salary. They're paying for your product in an interesting way. They're making the money that pays the business…

Yep.

That pays your salary to build the platform that they use. They're not paying you directly, but they certainly are paying you, right? At the end of the day, like, this is our hobby, this is the thing we love, we like writing software, but it is all in service of a business moving forward. And if you lose that… if you lose that, we're all gonna lose our jobs.

I think that is an awesome, awesome point you're making about like that different users are going to be different ways. And that's why we talked earlier about having different levels of abstraction. Cause some users want to go in… some users need to go in and move all the knobs on something, right? They’re working in a really highly tuned part of the company versus others who are like, “Uh-uh, I'm doing like the front end whatever… that is not optimized… not a big deal.” It's sort of exploratory in nature, it's not core to the product, et cetera. They can just be happy with the baseline. So I think that point of different users wanting different things, getting into the different tech is a really good one and not forcing the inner sourcing model, but enabling it. 

The other thing I found really interesting about what you just said, which is definitely one that I'm particularly attuned to, is you mentioned the like bundles that you all work with, the promises we all work with, that you're able to actually extend the platform without affecting other users. In the sense that you can “inner source”, you can attach to the platform a new behavior without asking the existing behaviors to change for all other users because it's just an add-on. It's just another plugin. It's another service, another option. 

That is where I start talking about the value of that democratized platform, the value of not having a centralized blocker, but actually having a system, a platform that enables producers to put things on the platform and consumers to take things off the platform. Because that's what gives that sort of freedom to inner source in an even more powerful way than a PR onto a very kind of tightly scoped repository, single repository or whatever. So that's very interesting.

Yeah. And the thing that is cool is you can just fork the thing. Like, right? And that's what we tell our customers – just fork it. 

Yeah.

It's built into every Git tool that there is. Just fork the thing. And it's pretty easy. And like, what's nice about it is if you fork the UGC bucket to the logging bucket, you can continue to pull in upstream. So if people are putting in policies or whatever, like you can still get that stuff and pull it in. But like, if the team has said this isn't a big enough use case to warrant being an official part of the platform, don't obstruct people from getting what they need. And what you might see is three, four, five… a bunch of other teams start using this. And now the platform team goes, “This has become a use case that makes sense to the business and has essentially reached a magnitude of people that are using it that it makes sense for us to start to take on the management of that.” So it's not a bunch of teams trying to figure it out themselves. 

Don't obstruct people from getting what they need.

And your platform should encourage that evolution. So like, I'm sure you all have the same idea of like packages and versions, right? You know, built into Kubernetes CRDs, you can have different packages. So you can have the officially supported package within your organization. Then you can have the alpha experimental, whatever you want to call it package and enable things to change packages, right? Or to change versions. Like it can be alpha one until, you know, platform sees it as a useful enough or widely used enough service to then elevate it to a V1, right? 

These are all ways in which you can control, not worrying that things are getting out of control and everyone's doing everything differently and you don't know what they're doing. If they want to build on your platform, on your system, you all of a sudden have visibility into the shadow IT. 

Yeah.

That is brilliant, right? 

Yes.

Like you mark it as experimental and alpha, but you still have visibility and it's on the path to production. You're not having to worry about someone doing something off in Heroku, and Vercel, and in the cloud directly with their credit card, and all of sudden becoming something that's crucial to the company, critical to the business running. And now you have to operationalize it from nowhere. No, build your platform so that taking those experimental things is easy for them to put there because you want that to kind of reduce that shadow IT movement as much as possible or allow them to be in the fold as much as possible. So yeah, I think that's really interesting.

The biggest pushback that we see from teams that are like mildly afraid of platform engineering, and the biggest concern that the teams will typically have is – How do I know what people are doing? Like if I give them self-service, they'll do anything they want. Being able to have… I don't want to say observability because I know that it's muddy…there we go, some muddy words, but like being able to…

Traceability?

Yeah, like being able to have some sort of way of being able to like oversee what people are doing, while giving them the room to do it, I think is one of the things that makes it a bit freeing. It's like, “Okay, I can see that these patterns are starting to coalesce. Like maybe that should be a part of the platform.” And that's happening, right? Like, if we rewind three years ago, people weren't talking about AI and model hosting as much as they were talking about ETL, right? And so now we're starting to get new things that the system is required to do.

If it's just like, “Hey, ask us and we'll build it when we think it's valuable.”, that's a weird way to go about it. When you see teams start to experiment and now you're starting to see patterns of like how three different teams are doing their model builds and fine-tuning. Now you can say, “Okay, well, how do we create a nice abstraction that works for the other 85 teams?” And now that can be a part of the platform. 

Build your platform so that taking those experimental things is easy for them to put there because you want that to kind of reduce that shadow IT movement as much as possible.

It's funny, like, you know, you'll have engineers that like kind of like look up to the platform engineers it’s like, “oh, they know everything.” And it's like, I think a really good platform engineer is saying, “I am learning what we should be doing by watching what you all are doing. And what I'm trying to do is figure out how you all work and how I can create efficiencies and like synergies… not to use that word, but… between these different teams.” And like, I think that's where the good work comes from. Not me saying, “Okay, let me decree that this is how you all run a Postgres database.”, right? And people are like, “We run MySQL, dawg.” 

Dawg.

There are people that go about it that way, right? And I feel like you can't be prescriptive about it if you want people to adopt it. Instead, you have to see how people are using it and what use cases they're presenting and say, “OK, how do I take those and how do I bring them in so that it's more accessible to the rest of the team and that we can have our security and compliance oversights and requirements built in there.”

Have you seen the blog “Let a 1,000 flowers bloom. Then rip 999 of them out”? 

No, I have not seen this.

That's one for tagging here. It's a great blog. It's older. It's more than 10 years, maybe, old at this stage. I don't know, ages ago though. It was written by someone, I think, in one of the big tech companies at the time and like FANG style. And it was just talking about that. Like the best technologies, when it gets solidified for an organization is when they've had multiple teams doing the right thing for themselves and then organically identifying what the patterns are, what are the consistencies, and get to the point where you then are like, “Here is the one path forward.” 

And yes, of those 999 flowers that you're pulling out, some are going to come out very easily and they're going to be grateful for getting rid of having to think about it and they're happy to transfer on to the centralized. And some of it's going to be really painful because standardizing is part of their identity or a big specialty in their product or their team. And so can you get down to one flower? Who knows? And all that kind of stuff gets talked about. It's a very interesting concept, though, I think. I've also seen Charity Majors write about it as well. But she writes about everything well, so I'm sure she's got a blog on it.

You can't be prescriptive about it if you want people to adopt it. Instead, you have to see how people are using it.

Yeah, I'm going to have to read that one. I definitely have not. But I love that concept and as a person who's just completely redone their yard, I know the difference in pulling out... Like sometimes you pull something like “That was easy.” And then other times you're like, “What in the hell? This little plant's got like a 20-inch root system. Like, what are you doing under there?”

Exactly.

Technical Implementation Of Kratix

Back to Kratix, so you guys are open core model, right? Okay, so anybody can just grab it today, toss it… it runs as an operator?

Yep.

Just toss it into Kubernetes. What is the experience? So there's the promises, what is that under the hood? Am I writing promises as a platform engineer, or am I writing Terraform that's portrayed as a promise? Like how does the actual provisioning work under the hood?

Yeah, it's a great question. Kratix is the open core framework for being able to manage your services across your infrastructure. 

Okay.

So to accept requests in and then ferret out the infrastructure or the provisioning across your infrastructure, whether it be Kubernetes or elsewhere. The promises is what tells the framework… defines for the framework what it needs to do. And so a promise is the embodiment of something as a service or anything as a service.

So your promise, you might have something like a database as a service. That would be a database promise. And the reason why promises are able to be generic in concept, even if you're going to have specific implementations, is because something as a service is generic.

 When you provide a service, you need a way to be able to accept requests. And you need a way to able to find what needs to and can be a part of that request. You need a way to be able to pass information back to those users and to actually take that information in, provision what they asked for, and then get them back what they need to be able to use the infrastructure. You need a way to be able to have common dependencies across all instances. 

So if I'm providing databases as a service, I might need to have cloud accounts that can run those databases. I might depend on third party providers being configured correctly to vendor those databases out, those kinds of things. And you need a way to be able to actually schedule this stuff. So you might use PlanetScale SaaS for production, but you might use a Postgres operator in Kubernetes for Dev. And you want to be able to manage when a request comes in to schedule some things to Dev, some things to Production, and so on. So those sort of components of something as a service are all defined as a promise. 

So what is the platform engineers experience? It's YAML. It's a custom resource type in Kubernetes. You can define everything as YAML. It's great. Just make sure you get your spacing right. And it all sort of wraps up into a CRD for your interface. So you're writing sort of the CRD from Kubernetes and a set of containers that run as your workflows on request and on setup.

Okay.

Someone has previously described it as sort of controllers or operators for dummies. And I think that's actually really smart because who wants to be worrying about reconcile loops and when they should run and what their frequency is and how to manage things in an idempotent way, et cetera.

Give us a container. We'll run it for you at the right times. We'll get you the feedback you need. You get the behaviors of a control loop without having to learn Golang or the controller frameworks or those kinds of things. 

Very, very cool. So I mean, YAML is my favorite language. 

There you go.

I'm not even joking. I'm a YAML fiend.

You've got the ID set to two space indentations, don't you? You already are set up, ready to go.

Yes, yes I am. Have you seen my YAML stickers? Did I not have those at KubeCon? 

I don't think I got a YAML sticker from you.

My gosh, okay, I'm gonna mail you a pack of my YAML stickers. They're just like glittery and they just say “YAML as fuck.” I love it. 

That's very cool. So, I mean, for people that are very familiar with operating Kubernetes, they're not having to be forced into some language that they don't necessarily know to be able to start implementing this stuff.

You can write them in Python. You can write them in Bash… containers, right? So we find that the people who use Kratix and create promises quickly evolve beyond something like Bash because we also provide testability. We have a set interaction model with volumes attached to the container that you can test sort of inputs and outputs and validate that your container is operating the way that you want it to. 

And so they're like, “Hmm, I want to test this. I want to unit test the code and integration test the entire container. I'd like to write this in…”. I've seen it written in Ruby, I've seen Python, I've seen Golang, I've seen Elixir, Rust… so it's like whatever language you want to script in, you want to write your code in, you can do that because it's just a container.

That's very cool. That's one of the things I've always felt was important for starting to get this done – like we can't be forcing people into completely new languages to make it happen. I think that's one of the things that's really hard to do, especially when you're a team that's starting to look towards platform engineering, you're underwater. Like the last thing you want to do is… you know you're going to have to learn a new couple of tools, but all of a sudden being like, “And you’re going to have to…”

Good luck.

“And you're going have to learn Perl 3.” And it's just like, “I don't have time.”

And you're going to have to write it all over again. You've already done all the hard work. You have a whole bunch of Terraform. You have a whole bunch of Ansible. You have a whole bunch of Bash, and et cetera, et cetera. 

Should you have to rewrite it all? No. But I think one of the pieces of this is the thought process – when you step outside yourself as a platform engineer and you think about the organization. A healthy platform engineering within an organization might be 10% to 15% of the org… -ish, or the engineering team, right? Depending on the scale of the team and so forth. Those are the numbers I sort of see and hear. If you have to learn a new language, that's painful. But what happens with the 85% or 90% of the org that needs to learn the new language if you change and they interface directly with that implementation? 

Hmm, woof.

So I think one of the key things here is that it's an abstraction, just like when you talk about your bundles. Promises provide – it's a CRD, it's YAML. Now, it's going to look real similar to a Helm YAML file, like a values file, right? Because it's just YAML, you give me some values, and we're off to the races. The difference is, when you want to switch from Helm to Customize, it's still a YAML file with the values when it comes to that 85-90% of your organization. Even though you as a platform team have identified a hopefully legitimate reason to do a rewrite into a new language. 

You are providing a service, not an implementation.

I think that abstraction, that realization that you are providing a service, not an implementation, that really is one of those nuances that if we lose it, platform engineering will lose a lot of its potential for impact in the industry, I think. So I really want to see us keep that nuance popular and in the forefront of people's minds.

Open Core Model And Pricing Philosophy

The open core model though, so this is something that obviously I'm interested in based off of all the Terraform and OpenTofu stuff that happened recently… How are you all thinking about this open core model?

So there's a company that owns Kratix, right? Like there's Syntasso. So like, how are you deciding what goes in the open source product versus the commercial product? And like when does an open source user start looking towards your managed solution?

Yeah, I think it's the right question to ask, first of all. Everybody should be asking this of every open source oriented vendor, because we have seen too many kind of go sideways. 

So first of all, as you said, there's already that split. So we do have the setup to be able to go into a foundation when the product is ready for that. So Kratix is its own brand. It can be in a foundation with Syntasso being kind of the vendor behind it. So this was a thought process from the very beginning. This isn't an afterthought from our point of view.

Hell yes, by the way.

What you're talking about is a philosophy and our philosophy is everyone should be able to build self-service platforms that incorporate their business needs and are fleet managed and enable them to move fast, to be safe and to be efficient. So the framework that enables that, and the abstraction of a promise that feeds into that – completely open source. All Kratix, all open source. 

Now some companies value their time and their energy and their building of platforms differently than others. Some have the elbow grease and the time to spend to build it themselves. Cool, go open source. For the organizations that want to move faster, that want to get started quicker, that want to integrate more deeply with their third party dependencies without having to own those dependencies and those integrations – that's where enterprise comes in.

Very cool.

Like we have an integration that keeps up with Backstage. I mean, they're releasing things all the time. Our integration is making sure that we're keeping up with the most modern version of Backstage, and you don't have to even think about your portal anymore. You get that user experience just by building your APIs in promises in Kratix. That's an enterprise feature. 

You can still use Backstage with the open source, but now you need to be aware of when Backstage changes its API. You need to create those components for Backstage from the API, et cetera. 

So for us, it's all about like accelerating the adoption and creation of platforms. That's what the enterprise offering comes with. And of course, all the other things, account managers and security and more secure builds and more kind of long soak testing. All those things that you kind of expect, but the philosophy is around kind of speed to market with your platform and maintainability over time.

That's very cool. So I can get started on it. And then essentially, I get to the point where I'm like, “Hey, there's a lot of integrations we're using in our third parties. We're starting to spend a lot of time on this. We have our own stuff that we want to be able to focus on.” And that's where I start looking... I mean, obviously, security and whatnot… but that is the product tangent moving towards Syntasso.

Yep. 

There are these components that I definitely need. I don't necessarily want to be on the hook for them though because we have our own stuff that we're dealing with. 

Yep.

That's very cool and then that people have the freedom to Develop and they're not like, “Oh, I'm gonna use this tool a bit.” and then it's like, “Oh I have to buy, I have to buy it if I want to actually do this in prod.” It's just I can put it in prod today.

Absolutely. And the thing is, it's actually part of our pricing model as well that we thought of. One of the things that we were really thoughtful about is we didn't want to drive architecture bad behavior based on a pricing model for the software that you use and depend on.

We didn't want to drive architecture bad behavior based on a pricing model for the software that you use and depend on.

I love you guys!

How many times have you looked at a pricing model and been like, "So you're telling me that I have to pay per server that I run. Which means that I just need to bin pack a single server with like a million things so I can only run a single agent, which will save me a ton of money.” Or, “You're telling me it's so expensive to run this logging tool that I can't run it in both Dev and Prod. So I actually need my teams to learn how to use this open source tool in Dev and the proprietary tool in Production because – pricing.”

You know, those things are not things we want to do. And so when we talked about it, we talked about like, you know, number of promises, or number of infrastructures that you are managing, or number of requests that come into the platform. All of those will drive bad behaviors. 

We want the same thing that our users want, which is productive platforms. So we just go with a flat fee for the year plus some seats. And that way, if you're having users being happy with things and using things – they pay for it. If they aren't, if we aren't getting widespread adoption because the platform isn't providing what you need, then we don't get widespread seat usage.

So we're trying to keep things simple and trying to make sure that people can build the right platforms for them and not the right platforms for our pricing model, which is silly.

I love this. One of the things I detest in our industry is people's pricing models. 

Yeah. It's hard work though to be fair. Like I'm sure that in like six months… a year, I'm going to be like cringing at ours and we're changing it, improving it, whatever. So we're working to make sure we keep our eyes open is what I'll say. But we're trying with the…

Yeah, you're gonna be like, “Hey, Cory, can you bleep out that whole like last five minutes?”

“Hey Cory, remember that podcast interview six months ago? So we've learned that was silly. Can you…”

I mean, I think we've had nine different pricing models since we've launched. Like it is hard to get right.

I feel like we're very similar. Like we're thinking of this from like… not just a business, like we have to make money obviously, but also like we're thinking about the people that we know are on the other side of these budgets. 

We were the people on either side of these budgets for more of our career than not, you know.

Yeah, it's hard to get these budgets, right? And like there are a lot of ways that you can price things, where as a business you can make a lot of money, but you will introduce bad behaviors, right? I’ve seen a lot of things where it's like, we’ll limit the amount of deployments. 

Mmm.

And it's like, “You just told me how agile I can be.”, right? So it's just like… or based on the number of resources that you manage. And it's like, “Well, you just told me that if I have stable infrastructure, you're going to penalize me.”

Yeah.

That legacy stuff makes money. I'm not changing it very often, but like, “eh”. And I feel like we've gotten so much pressure from VCs like, “Do something usage based, really getting there and fucking drain people's wallets.” And it's like, “Dude, these, these people have to fight for budgets so much harder than like the rest of the teams.” And it's just like, we want them to be happy and not tell them how to do their job via a pricing plan. And I love that you guys think about that and like put that into your work.

I think that mentality is what you need. Like we might get it wrong. Just like you said, you've been through nine versions of this. We've been through a couple of versions ourselves. I'm sure we have more versions to come. But like when you have the mindset of “We want you to do your job and we want you to be getting value in the design patterns you want to use in your organization. And if we are ever getting in the way of that, we will evolve.” Like that's the key, right? Because getting it right is hard work.

Yeah.

And it takes time and you learn your lessons as you go. But it's that mindset that I think matters. And so that's what we try and keep at the forefront.

Yeah, that is very awesome. 

Well, I know that we're at time. I really appreciate you coming on the show. I am going to immediately have an email sent to you to see if we can get a second time, because I have so many questions I wanted to talk to you about today.

I've talked too much, that's what I've learned, okay.

No, I'm the fucking worst at it. We're like a third of the way through and I'm like looking at these and I'm like, “Oh my gosh, there's so many other ones.”, but like maybe I'll just send them to you and you can just write like 45 blog posts for us and that would be great too.

We'll figure something out, I'm sure. This has been awesome. As you say, our philosophies are so aligned. It's so good. Every time I get to chat with anybody from the Massdriver side, I love it, I think it's great.

Awesome, so where can people check out the projects and where can people follow you online?

I'm online. LinkedIn is probably the easiest, but you can find me on all the socials, BlueSky, Mastodon, et cetera. I have handles and things. The projects are Kratix.io and the company is Syntasso.io. And you can jump into both of those. So get free keys to trial Syntasso or just jump into the open source project. And you'll get links there to our Slack community that you're welcome to join as well. It's a growing community of people that are building up promises and talking about how they're doing it. 

Heck yeah, I love it. Awesome, well thanks so much for coming on the show. Make sure to check it out and we'll see you guys next time.

Links

Featured Guest

Abby Bangser

Founding Principal Engineer | Team Topologies Advocate | CNCF Ambassador