Trust, Lock-in, And Better Infrastructure Management
Infrastructure Management, Scaling, and Trust
Why do 70% of organizations still struggle to adopt infrastructure as code? Sören Martius, CPO and co-founder of Terramate, joins Cory O'Daniel to tackle the challenges of modern infrastructure management and the delicate balance between vendor trust and lock-in.
The conversation explores practical solutions for common infrastructure challenges, from managing monolithic state files to orchestrating complex deployments. Martius shares insights on:
For teams wrestling with infrastructure complexity or evaluating new tools, this discussion offers practical perspectives on building scalable, maintainable infrastructure while avoiding common pitfalls around vendor lock-in and team adoption.
Hey everyone, welcome back to the Platform Engineering Podcast. Today I've got with me Sören Martius, CPO and founder of Terramate, a buddy of mine. Welcome to the show, man.
Thanks for having me.
Want to do a little intro about yourself, a little bit about your background?
Yeah sure let's dive right in. So, hey guys, I'm Sören. I'm, as Cory already said, CPO and co-founder of Terramate. I have a background in engineering - been in the space for around 15 years.
What is Terramate? Terramate is basically an Infrastructure as Code management platform. We do two things. We have a CLI that helps organizations to break up large state files into multiple smaller units that they can orchestrate and then make sure that they keep them DRY with code generation. And there’s a Cloud platform on top of it, it basically adds additional features for observability, drift detection, better collaboration, feature self-service, those kind of things.
Yeah, breaking up, breaking up big monoliths, terraliths, that is necessary.
I wish I would have invented that word though.
Oh, terralith?
I think it was Matt and he did a great job.
That is a really good word. But I mean, that is like, it is a thing. And it's really interesting. I feel like, there's organizations I go in that do a very good job of just like keeping state tiny. And then there's… I feel like it's almost from like a certain era of Terraform and IaC where you have these like larger Terraliths existing.
Like what are you guys seeing as you're working with different enterprise customers? Are you seeing a lot of larger Terraliths still? Are people making them net new or is this like a relic of the 2017s, 2018s?
Yeah, so here's the thing. They should be a thing. And they should be a thing because when you get started with Terraform or OpenTofu, I think it's way more straightforward to keep complexity low and then naturally you result in a Terralith at some point.
So you start maybe deploying one or two environments, perhaps in a single state file. That might go well until a certain point of time where your state grows to a size that's just not manageable anymore. But until then, it's totally fine because the trade-off for breaking up state files, and thus making it more manageable in the sense of runtimes, lower blast radius and whatnot, is complexity. So I would actually advocate for any organization that gets started or that has an overseable size of infrastructure to stick to a Terralith and manage everything in a single state file.
The trade-off for breaking up state files, and thus making it more manageable in the sense of runtimes, lower blast radius and whatnot, is complexity.
Yeah, I mean, that is pretty valid. It is a good way to get started, right? Personally I start pretty small, but I mean, if you're newer to the tool, the overhead itself can be exhausting of just like “doing it the right way” or having smaller states. Like it is tedious, right? You do have to think about like, how do you pass all this stuff from one module to the next, or you're copying and pasting it. And if you can just ref addresses in one big bag of Terraform, like it makes it much easier to reason about.
Yeah, totally. And look, what I see quite often is that once… so there's always ever two types of organizations: organizations that may not that be that experienced with Terraform - they newly run into this issue and then they look for a solution to it, where there's like… the space is so fragmented, there's, I don't know, a couple of dozen approaches on how to manage this - or you deal with a rather experienced team that may have hit this kind of issue in the past a couple of times and they know exactly what to do.
So now the problem is, I think, fragmentation in the space. There are so many different approaches on how to do that. And the confusion kind of starts with HashiCorp, Terraform works, CLI workspaces versus cloud workspaces - two fundamentally very different concepts that honestly took me at least two years to understand the differences.
Hahaha
But yeah, then there's directories, tfvars, partial backend configuration, Terragrunt - those guys have been around forever, right? Also a great tool.
I think what that leads to is that - so coming back to your initial question, how this looks like in enterprises - there's no common sense. You'd be surprised how many different environments I see, and then how many of those do not follow best practices or what the community or we would consider to be the best practices.
Yeah. I mean, I've seen a lot of stuff that I would consider very weird.
[both laugh]
But I mean, it's one of those things, it's only weird if it doesn't work for your org.
I think that's one of the things that's extremely hard about IaC is like, it is tied to kind of how your organization's DevOps processes work. Like if you're far future and you're doing self-service, like that is a much different picture than if you're very Ops oriented and they mostly own the infrastructure, right?
Yeah.
So it's a very hard tool, and language, and even just like paradigm to say, “This is how you should do it.” if you don't understand what the socio-technical aspect of that company is.
Yeah, I agree. Then also too, engineers naturally tend to build things themselves. So it's just understandable that when you approach a problem, as an engineer, you try to wrap your head around it and then like how to solve it yourself, which often to be honest in our space turns out in, or ends up in do-it-yourself solutions that may not scale.
So this like classic buy versus build, build versus buy decision, is something that I see heavily discussed in our industry. And again, it all leads to the point of how experienced is your team and how many times did they already hit the wall in the past, right?
Yeah, but eventually you're right. So the whole DevOps movement - and I think we can agree that almost every single organization wants to move towards self-service, right? It's like the big trend right now - and then everybody's naming this Platform Engineering, apparently. But you're right in a sense that there's no - like we call it this golden path - but there's no single approach.
Almost every single organization wants to move towards self-service.
It very much depends on how the culture in your engineering organization works. Like the kind of risk that you would like to take - because it's trade-offs by the end of the day - but then also how technically sophisticated you are.
I think that the point that I want to emphasize here on is that as your engineering organization scales, an approach that may work for 20 or 50 people will surely not work for 100, 250, 500. So it's an ever evolving process that, along the way, there are so many tools in our space that are highly opinionated that only work for a certain stage within this journey. Especially because they're mostly pretty opinionated on how to manage environments, how to manage your code.
It's just something where I think I've seen a bunch of teams throughout, let's say, the lifecycle of the company - two years in, four years in, five years in… or like how we like to look at it, seed stage, A stage, B stage - they reiterate on their approach and it often leads to actually ditching the old approach, most likely leaving it as it is and starting from scratch.
Which honestly, from a CTO's point of view, I understand that. Migrations specifically Infrastructure as Code don't really provide that much business value compared to the investment, especially if you onboard non-codified assets and those kind of things. So now there's vendors that solve that but come with a pretty heavy price tag.
So yeah, I guess the narrative bottom line is like there's no one fits it all solution and approach and everybody has to figure it out on their own, unfortunately.
There's no one fits it all solution and approach and everybody has to figure it out on their own, unfortunately.
Yeah, yeah. So I'm going to do an awkward segue to something that I feel like you and I have been wanting to talk about for a while. This is not rehearsed.
So I feel like every time we see each other… so just to be clear everybody, we live - well, right now we're far apart, we're oceans apart right now.
Hahaha
Sören is back in Europe at the moment, but he normally lives just 30 minutes west of me. But we only see each other like once a quarter for some reason. We're going to fix that when you're back home.
Usually when you invite me to the bar.
So we keep saying like, we want to talk about like the state state of the industry. And I would love have a little bit of that conversation now - like catch it on the show.
I mean, I have a lot of opinions on the state of like Infrastructure as Code. I feel like adoption has stalled a bit. But I'd love to know, where do you think we are, especially with like AI, both of our companies growing - Like, are we in a period of innovation, stagnation, consolidation? Like, what is happening in the world of cloud management and infrastructure?
So to me it feels like we're moving backwards. I do think it's stagnation if you want to put it in that spot. I think what is happening is that when it comes to new trends, AI, LLMs, those kind of things, there's a lot of hesitation in the industry. And there's a lot of hesitation because infrastructure - mostly mention Infrastructure as Code - is sort of the heart of your entire engineering organization. So there is fear of breaking things. There's fear of breaking things that directly reflect on lossage of business, right?
If I compare this to other categories, like even by looking at our own teams inside of Terramate, like data engineering or so - man, it's so much more innovation and it's so much more through the door, right? Whereas in DevOps and Platform Engineering, it's a lot of talking. It's a lot of vendors - also obviously advocating for their approaches. But actually it's very little innovation happening.
That's how I feel. So would love to hear your thoughts on this too.
Yeah, I feel like I'm honestly, I'm definitely in the stag nation phase. And I think that's one of the things that like we try to do at our company, like as far as like adoption goes, it's like, we're trying to address that stagnation.
I feel like the numbers are there. Like when you see like the state of CD report, like people adopting Infrastructure as Code, like it's sat pretty flat for the past couple of years.
I have conversations with customers about like why that is when I meet customers that aren't doing Infrastructure as Code today. And it's a lot of just like, we don't know how to do it, we don't have time to learn it, and so we just click around in the AWS console.
I mean, it's one of those things where it's like, I don't understand why. Like we're all engineers. We all have to sit down and learn these things. Like why are people tending to choose ClickOps rather than starting to codify things and get these benefits that we get from it.
I find it concerning because like we were all putting stuff in the cloud and, you know, it's something like 70 % of orgs haven't adopted Infrastructure as Code yet. And it's just like, how inefficient are those teams in managing… or are they super efficient? And like we're going about it the inefficient way, trying to codify all this stuff instead of just clicking it.
I feel like it's pretty… it's been pretty stagnant. Like I would love to see those adoption numbers go up. I think that, you know, part of the pain there is just we don't have as many people in the Ops space and of that deeper cloud experience as we do engineers. And so it's kind of like two things almost that any given engineer has to learn.
You look at an earlier stage company, let's say Series A, let's say you got 15, 20, 30 engineers and maybe they've been working, you know, deploying on Heroku or Vercel or whatever for years. And now they're like, we have to move into AWS for some service that we need. It's like, “Okay, well I have to learn how that service works and I have to learn how this Infrastructure is Code tool works and how to organize it. Do I organize it Sören's way or do I organize it Cory's way or do I organize it…?” There's a lot of stuff where you're just like, “I can just click this shit in the AWS console and it just works and we can get back to like building the business.”
I feel like that's where many companies kind of get started in the stagnation around adoption and then sit until they've reached that point where they're like, we have to finally do something about this. It's one those things where it's like, when they’ve reached that point, it's also a hard one to get over, because it takes so long to get there.
It's like, “Well, I mean, it's been working. Why don't we just keep doing… Why don't we just keep ClickOpsing things?” And you see this with the drift tools. We start to adopt it, people are still clicking. Why? Because there's still a lot of friction around just getting the code in place to get me to this thing that I want, instead of me just going and getting the thing that I want.
Yeah, I think we… like you and I, we talked different companies because when people consider Terramate they've already been through so much trouble and so much issues with Infrastructure as Code. So that naturally the organizations that we talk to are not necessarily organizations that are still thinking whether or not to adopt Infrastructure as Code - whether it's Terraform, OpenTofu, Pulumi, whatnot.
So I think that naturally, because of what you guys do, you may talk to different type of company than us. What we see though is that, even under the segment of companies that have already adopted Infrastructure as Code and are trying to solve certain scalability, usability, self-service issue, developer experience, whatnot… For example, the problem of somebody going to the GUI, to the AWS console and just changing something by hand - that's a cultural issue.
The problem of somebody going to the GUI, to the AWS console and just changing something by hand - that's a cultural issue.
That's a cultural issue and then it's a tech issue. Why? Because as an engineer, I obviously don't want to have anything to do with Terraform. It's complex. HCL does not naturally fit into my spectrum of knowledge if I work with TypeScript. Then also to the underlying complexity of configuring Cloud services, it’s just a whole different thing. So for me to go through this process of doing a change, it takes me, even as an experienced user, it takes me 20 or 50 times longer than going through the AWS console, just doing it there.
But the issue doesn't start there, right? The issue, if you want to prevent this, you have to have an IAM constant in place and basically forbid your users doing that. Eventually this will probably lead to developer dissatisfaction, they will not be happy and everybody will complain, tickets will back up and those kind of things.
It's such a tough question to solve, to answer. For me, I haven't seen that many organizations hesitating when it comes to adopting Infrastructure as Code. They may not have the knowledge, and they may not have the manpower, and they may be so overwhelmed by the landscape of tooling that is insanely fragmented - I've just posted about this today on LinkedIn. It's like, you know, every day there's a new Terraform tool that does something better than… I don't know the previous tool, you know what mean?
Yeah.
And it's really hard to choose the right tool. So I think some organizations are just like, “Wow, this is so overwhelming. We don't know how to handle that.” So naturally, they may not adopt it. Then also, too, to sum up this epic monologue… sorry.
[Cory laughs]
If you look at the market of technology and innovation, Pulumi Cloud is so much superior to Terraform Cloud. It's insane! If you look at the product, just from an - and I heavily advocate for Terraform and OpenTofu because I like the ecosystem, I like the principles we have behind the tool… but it serves a different user persona, it serves Ops-centric teams. Pulumi serves developer-centric teams.
The question I have is like why has Pulumi not been adopted more than it is right now. It's crazy! Like the tooling is good. The language is good. The cloud platform is so much better than most of the TACOS available. So I'm just like, I'm trying to wrap my head around the fact… but I couldn't answer it yet.
Yeah, I don't know either. I've actually seen the cloud platform, it is pretty good. It is funny because we've actually literally never - like we work with a bunch of different IaC tools - we've literally never had somebody show up and be like, we use Pulumi.
And I don't know why that is. I mean, I personally know why I don't like the tool itself. I think that the cloud platform does have tons of features over a Terraform cloud. That's one of the ones that's always a little confusing to me. It's like, why isn't it more popular?
So my hypothesis, sorry for jumping in…
No, no, give it to me. I was going to say mine, but I think mine is a very weird hypothesis. But maybe it's the same.
So my hypothesis is that the long tail of the market that has the knowledge to manage infrastructure are Ops people or people with an Ops background. And they come from Terraform. They prefer the Terraform approach. They prefer the large ecosystem, the community, et cetera.
So now teams where you try to shift the responsibility onto the developers, I think those teams would always or most likely choose a different approach. Whether that's Pulumi or any Kubernetes native approach - cross-plane, Google Config Connector, I think AWS released a tool the other day - I haven't looked at it yet.
But the problem is, again, tooling just solves the issue on a technical level. The knowledge that you have to obtain in order to get whatever business value out of the tool - it's not the knowledge that developers have. So that's why I think Pulumi is struggling with adoption.
Tooling just solves the issue on a technical level. The knowledge that you have to obtain in order to get whatever business value out of the tool - it's not the knowledge that developers have.
Who knows, maybe it looks different, maybe I have a totally wrong perspective on it because I don't actually know how much revenue they make or how many organizations use them - they have quite some impressive logos. But I see that the market itself is heavily dominated by Terraform and OpenTofu.
Yeah, my kind of gut on why it is - I think it's kind of stuck in the same problem as like the stagnation, right? It's like, again, if I'm a developer - and I am a developer, Ruby development for years - and I go and I look at… let's say I don't have any cloud experience or I've got like minimal cloud experience… there's all this stuff I still have to learn in the cloud and now I have these tools I have to look at. And I might say, “Oh, Pulumi, I can write this in Ruby. That's nice.” Well, it doesn't make me a cloud expert, right?
Having a language that I'm familiar with to manage the cloud might make that easier for me to start doing, but I still have this knowledge gap that I need to close. And to me, syntax is syntax, and so one of the reasons I personally don't like it is I don't need a lot of logic in my infrastructure configs. And I think when you start getting to the place where you're like well “if” and “else” it's like, you might be overthinking your config. At the end of the day, it's just a API call to set some stuff up, right? It's fancy JSON.
And I personally just am like not a fan because it's like, now I have to like process logic, not just like, “Hey, here's the…” - It doesn't quite feel as declarative as what I'm personally used to. I don't know if that translates to other engineers. If they're like, “Oh, do I have to write tests around this? It's code. Do I have to write tests around all this code?”
Yeah. I still think that most engineers that do not have an infrastructure background, they actually don't really care so much if Terraform and Pulumi do it. They just don't want to do any infrastructure. You know what I mean? It's like, “Oh why am I responsible for writing this microservice? Why am I supposed to launch a VPC in a database now and then also manage it, you know, on a day two level? Like that doesn't make any sense. That wasn't written in the job description.”
Yeah. You know what’s really funny about this whole like developers don't want to do it thing. I've said this for quite a while. You're saying this. It's really funny. Like, I don't know if you've had this at all, but like when we first started fundraising, the amount of VCs that were like, “They very much like to do that.” And I'm like, “I fucking don't think you've ever talked to a developer in your entire life.” She's like, “They don't?”
And they might be interested in it. Some may, but the majority do not, right? And I think that's not a problem. I think that's something that kind of got mixed up, I feel like, in the whole idea of wherever DevOps has ended up. It's like, we tore down all the silos, everybody does everybody's jobs. And it's just like, I don't know, if everybody does everybody's jobs, nobody's doing their job well. It's difficult to understand who owns it.
If everybody does everybody's jobs, nobody's doing their job well.
I think a lot of developers like to work in their tool set that they're familiar with building stuff. And they're not super excited about doing that. And if they are, you’ll probably see them start to take a career path like we took where it's like, “Oh, this is more interesting to me. I'm less interested in the product, more interested in like building developer tools.” And like that's kind of the way I went.
But I think most developers just are fine not doing it because it's not… I feel like at the end of the day, like when you're building features and you're building a product, like configuring a database doesn't really feel like the value add versus the thing that a person's going to click and get some joy out of.
Yeah.
And I feel like that's, as our community of developers, like that's what makes us go - like knowing that we're building something that another human's going to use. And it's like, “I configured Postgres.” - “Eh, okay, like whatever.”
You're totally right. And I think this whole initiative, moving towards self-service, is more so about the initial deployment, or a lot about the initial deployment of infrastructure. I mean, naturally they too… like ongoing management of infrastructure, that's kind of the promise of the cloud too, right? Like, okay, it scales up and down whenever you need it. It's self-healing, there's very little management effort to put in. But then, yeah, the reality looks different quite often.
I do see that by looking at our usage that the frequency of usage of Infrastructure as Code declines over time as the older a service gets, I would say. So naturally you spend more time writing new infrastructure code, deploy new infrastructure instead of maintaining the existing one. I'll say you can agree on that.
The second thing is that you you talked to a bunch of VCs and they're like developers would very much like to do that. Dude, in our space, there's like so few VCs that actually understand the space very well. Okay. And if you compare Europe to the US, it's even smaller. It's even less, even fewer.
As a venture-backed company - you're a venture-backed company, right? Massdriver is a venture-backed company, so is Terramate - like you really want to make sure that whoever you partner up with understands the space very well and is not only excited about a 15x multiple.
So what I learned is that during my initial fundraise and the fundraise that is now led by my co-founder Chris (who's CEO of Terramate, great guy), is that there is this huge top of the funnel of people that want to talk to you and then eventually as you qualify them and you figure out who really knows the space well - a few.
Before we started recording, you mentioned you were actually going back home to start fundraising. Like how are you thinking about like finding the right VC for your next round? Like how are you trying to find these people that are familiar with the space?
I'm indeed back in Berlin right now. We have an office here even. So I'm usually based out of California, as you said, LA as well. Right now I'm here and it's… yeah, I wish I would be in Cali obviously - it's full-blown winter here. But at least I'm focused.
When it comes to the round, we're raising a seed round now. We’d previously raised a pre-seed. So we haven't really started yet. We did the deck and stuff. We have first conversations.
Like my sentiment… the way I look at it right, without having any data, is that we're in a time where companies, specifically organizations built around AI, that they democratize entire categories, grow from $0 to $10 or $20 million ARR in a couple of months. Previously I think Wiz has been the fastest growing SaaS company of all times, I think now it's Cursor - sorry if this is wrong information but I'm pretty sure it's Cursor. And then there's a bunch of others that got to $10, $20 million ARRs, it’s said, in three months. If you look behind the curtains, you see high churn, of course. They have like 40%, 50%, 60% churn, and you have to understand how sustainable it really is in the long run.
Still, on a category, infrastructure management - which is a category that we are in - it takes a hell of a lot of money to build a product, distribution takes a long time, and to convince an enterprise to start working with your software - we're talking about sales cycles of like, if you're lucky, six months, okay? Most likely you build a relationship over a year, a year and a half, and then they do a PUV and so on. So we go bottom up, we have an open source CLI, so a lot of the organizations that come to us now, they have used the CLI for a year or longer.
What does that mean? It means that deals, big enterprise deals… for a VC I think you become kind of sexy when you're in the category of like $30 to $50 grand ACVs, right? Once you get there, cool. But does that mean that after being in the market for two and a half years, you will have $10 or $20 million revenue to show? Probably not.
So I'm just trying to understand like, how do investors look at the market right now. And I think the ones that really understand the space, understand the value creation that is happening and that it just takes a lot longer, especially if you're an open source company. But yeah, it's scary. It's scary too because the market is so crowded by companies that grow faster than ever, right?
It definitely is. I mean, it's funny, like, I feel like one of the things that we've learned… sorry everybody for turning this into startup talk - Startup talk with Cory and Sören.
Hahaha.
So I mean, as far as like when we're fundraising and working with customers or acquiring customers, one of the things that is interesting is we have similar, I'd say, procurement timelines. They are very long procurement timelines. And we're a company… like we have some stuff in the open source space, like we're going to be open sourcing some other things eventually. So I would love to kind of work more on that bottom up.
We did invest a lot early on in ads, outbound email, all this stuff and it just never worked, right? And like, it was funny, like we had some VCs that get our space, some VCs that don't, and the ones that didn't were like, “You guys just aren’t personalizing enough. You need to do that more.” And I feel like that's just even gotten more difficult with AI because everybody's everything is just personalized.
What finally happened for us is, we realized that we're not selling a tool that like a developer can just buy and grab and use. Like we're selling something that’s going to come in and kind of be a… almost like a framework for a good DevOps culture. And so it is a shift to adopt our software. Like you have to be ready for that in your organization. Like you can't just be like, “Oh, I'm going to buy it and install it and we're good to go.”
What we finally realized is like the name of the game there is trust. That's why we started spending more time on content and more time on like workshops and webinars. And like that's where we started to see our customer base come from.
It wasn't from us going out and sending a million emails. It was us going out and like starting to say, “Okay, like who is having trouble just doing this stuff and what kind of content can we make to make that transition through the DevOps culture and through those stages easier?”
And if they don't buy our software, that's fine because I might be using theirs and I want my data to be secure, right? I want them to have like a better DevOps culture there.
It is really interesting. It's a hard space to sell into and I feel like there are a lot of tools in the space and it can just get extremely noisy. Like trying to scream through all of that by just coming in with like email and whatnot.
So in our space, the buyer is incredibly smart, right? Or at least they should be, right? So whoever you sell to… I don't know, director, VP, small organizations, maybe CTO directly… like they actually want to see a shift in the engineering culture. And your investors also want to see that, right? You don't want to be the tool that is like, "We do this like small thing a little bit better and you save five minutes a day." No, you want to cause a fundamental shift on how teams work today versus tomorrow when Massdriver comes in or Terramate or whatever solution really. So that's a big promise and a big bet that you and I as startups are doing.
You don't want to be the tool that is like, "We do this like small thing a little bit better and you save five minutes a day."
Then again, the difficulty is like if you sell this to a VP director or whatnot, the engineer is not necessarily bought in, right? Because they don't want to change. They don't want that change. They like to do the thing and like I get it, I'm a developer as well. I worked half of my life in engineering, right? It's always bad if somebody comes in top down and tells you like, “Oh, now we are migrating away from GitHub actions to Bitbucket pipelines.” “Oh my God, please no.”, right?
Yes.
The thing that you can do with open source and content is to build trust. And to eventually get reach to those engineering teams so they have seen you, they've worked with your staff, they believe in your vision, they like whatever you do. So that eventually if you sign a deal that involves a buyer that goes top down, if they go to the engineering teams and they're saying like, “So Terramate Cloud, we should explore this.” They say like, “Yeah, we've been using the CLI for some time, or we at least took a look at it.” - It's a lot easier, but it takes a long time, and a hell of a lot of money, and other resources too.
Content is really hard, and I learned that the hard way too, right? I read your content, I like it. I don't believe my content is bad, but also too, dude, I'm not a native English speaker, right? So it takes me a lot longer to produce something great compared to somebody that is native - just being honest here, right?
Anything else… banner ads (I see a lot of our competitors doing this on Reddit), outbound emails, those kind of stuff, I don't believe those things are working well. What works for me okay-ish is that honestly, I've written to a bunch of people on LinkedIn like, “Hey, have you seen Terramate CLI? It has those benefits, it solves those problems, and it's open source. I'd love you to give me some feedback.” That sometimes works, but I bet you that it's at least 500 people that have blocked me previously.
[laughs] That's your KPI - how many people on LinkedIn block you.
Oh, that's funny.
Writing content is… it's one of those things like… yeah, I imagine it's got to suck even more like targeting a mostly English-speaking audience and it being your second language. But it's funny, like when you talk about like the amount of time you spend on it. I feel like, you know, sometimes I'll sit down here - this is my whiteboard, it's like a digital whiteboard - I'm whiteboarding ideas, I've got drafts going, I'm sending them to people, I'm like two weeks into something and I post it and fucking nobody reads it. And then I'll just say something out of my ass. Like I'm on the toilet, I got LinkedIn open, I just fire something up in 10 minutes and it goes viral. And I'm like, “Do I just need to sit on the toilet more often and get on LinkedIn? Like what is going on?”
Hahaha
Like the amount of effort that you put in sometimes does not reflect the value. Whether it's entertainment value or like an actual little piece of knowledge, like a little tiny thought piece, right?
Yeah.
It's wild. And then, other times, like we spend a lot of time like building up workshops. And it's like, yeah, it takes us a very long time, but it also has a long tail because we put it on YouTube or whatever. And we'll have people… four months after we had a workshop, somebody watches the video and then all of a sudden we see them like come through the site. And then they sign up and it's like, “Oh, I learned how to use OpenTofu from you, so we figured we'd check out your platform.”
I mean, by the end of the day, it's very transactional, right? If you produce content - and I wasn't meant to say that it sucks, I'm just saying I feel that I'm at a disadvantage. I still enjoy it a lot, but it takes me longer than it probably takes you - so what I'm trying to say is it's transactional in the sense like, what we're doing is thought leadership really.
So you are providing an opinion or you're providing an how-to guide or insights or whatever that provides value to your audience. So sometimes it can be entertaining. I personally don't do that anymore. Also, I'm German - we're not necessarily known to be funny. So I try to focus on educational content mainly. And that's what you do.
You'd be surprised how many people silently follow you, read your content, they never engage until you meet them in person, or until you run into them at KubeCon or re:Invent, or they may book a demo at some point. It's crazy. The amount of times I hear people in demos saying, “Oh yeah, I've been following Sören for some time.”
On actual content creation process… when I first started doing it, I obviously had no clue what I was doing, so it was messed up. And then my co-founder Chris, who came in a little bit later, he's very experienced with all things marketing, content, et cetera. We've set up this entire process of drafting ideas, creating a nut graph, and then a skeleton, and then we write it out. And I take more than one day to write an article, to be honest. I go to bed in the evening after writing the draft or the initial version, and I wake up in the morning and have 1,000 new ideas in my head, and it usually evolves over a couple of days.
Yeah. It's funny you say the amount of people that read your content… I don't know if you've had this happen yet, this was so weird for me the first time this happened. It was actually… I think it was at KubeCon this year… somebody came up to me and was like, “Hey, I read one of your posts.” and I was just like that is weird that you recognize me. I mean, I'm flattered. This is super cool. It's never happened. Somebody's like, “I've read your DevOps bullshit post. I really appreciated that.” And I was just like, it's so weird... I don't know., it's very strange to me. I mean, it's exciting, I love it. But I was just like, yes, that's weird.
I’m now looking around like, “Who else has read this thing? Who else has read this thing and didn't like it and they're not telling me that right now?”
What I'd love to do is learn a little bit more about Terramate. So one of the things you said early on is you help break down these big Terraliths, like larger state files. What is the value prop of Terramate?
When you end up in a large monolithic environment or any sort of IaC environment really, there's two ways of managing this ongoingly, right? Either you can build a platform around GitHub actions, GitLab, CI/CD, Bitbucket pipelines or whatnot, right? Or you go and buy a TACOS. And TACOS are typically purpose-built CI/CD platforms. So TACOS stands for Terraform Automation and Collaboration Software - I always get this wrong
It's a reach of an acronym.
Yeah, yeah, but like writing this out is a lot easier than speaking it out. Anyway, those are purpose built CI/CD platforms, that are good products, but what we believe is like in 2025 people actually want to stick to the tooling that they already have. They want to run on GitHub on GitLab, etc. So what we do is we give them the necessary tools to turn their existing CI/CD into a TACOS basically. So we supercharge it with orchestration features that are missing - change detection is a big part of this. We help them to keep their code right.
Then Terramatecloud adds, for example, detailed observability on how your Infrastructure as Code develops over time. Drift detection - there's an incident management system for newly detected drift. Failed deployments that then creates some sort of an event, it integrates a Slack, it's assigned to the right person or team so that it's actionable. Asset inventory management so that if you, for example, fan out from a monorepository to multiple repositories, it's really hard to monitor your entire infrastructure footprint - so we unified this in a dashboard.
People actually want to stick to the tooling that they already have.
The idea here is that instead of buying into a single approach, whether that's Terraform, OpenTofu, Terragrunt, or Workspaces, directories, what not - all of those tools are very opinionated - we are actually a very flexible software that allows you to unify all of those approach because we believe that, I said this earlier on in the podcast, wherever you stand in your IaC adoption journey, you may change the approach to managing environments. You may change your tooling.
So we unify all of those on a single platform - very flexible, very dynamic, no hard lock-in. It's all native Terraform and OpenTofu and we don't struggle with security teams because Terramate doesn't need any access to your code, state backends or cloud accounts. We work on planned files in CI/CD.
Very cool.
I would also put this in the category framework platforms - very dynamic. And we definitely work with organizations that are technically, at least right now, a little bit more sophisticated. So it's not a rip and replace for Terraform Cloud, even though we have a bunch of folks that came over from HashiCorp. But I think for most of them there are direct replacements, you know them - Spacelift, Env0, Scalr.
So the other thing to it is that I think, as a company, the one question that I would ask is basically - Do you really think that in five years from now on we'll write all our infrastructure code manually? By hand? I don't think so. Honestly, at the times of LLMs… right now, results are not optimal and when it comes to how to structure projects, how to design state files, how to design environments, that's something an LLM can't do. There's one company that I think is exploring this, Stakpak, very interesting company to watch. It’s a startup out of Egypt, the founder is a great guy, I've met him recently.
So what I'm trying to say is that we're also heavily going in this direction because I do think, as you said, that we are at the point of stagnation when it comes to IaC in general or cloud infrastructure management in general - it needs innovation. So I'm taking a few big bets here in the direction of AI that are to be announced and published soon. Scaffolding is one of those big things that we're building right now. That's scaffolding entire end-to-end templates for complex infrastructure but having state in mind, having all those underlying details in mind. So not only generating a code. Hope that makes sense.
Yeah, yeah. So I won't ask you about the things that are soon to be announced. I want to, but I'll hold off on that. I do want to come back to AI in a moment.
Sure.
There's a portion of Terramate, there's a CLI, that's open source today, and then there's Terramate Cloud. So when somebody's coming to look at… let's say one of those LinkedIn people are like, “Okay, I'm checking out Sören’s CLI, let's see how it works.” Is that what they're actually putting into the GitHub actions or Bitbucket pipelines? Like that is actually going to be like a part of their CI/CD now.
Yeah, dude, you can…
Okay.
So yes, it's an orchestration and code generation tool, so what it does, instead of invoking Terraform plan or Tofu plan, Tofu apply directly, you would actually run this with Terramate, and what Terramate then does is it detects how many state files - we call them stacks, and our name is a little bit unfortunate because everybody is going in that direction, everybody and their grandmother - so we basically orchestrate them for you. There's dependency management in there. But then also Terramate does optionally, as you said, it looks at the produced plan files, extracts metadata and resource metadata, sanitizes those on the client side. It redacts all sensitive values and then it sends it to the cloud and that's how the cloud operates.
And that's a very nice model because now that means if you have a large scale Terraform infrastructure, you take the CLI, you install it, and it's a single command to onboard it. It automatically detects all your root models, all your dependencies - all your stacks or dependencies - bootstraps Terramate in there, non-intrusively - without changing any existing configuration. Then by running an initial… we call this a drift check, you basically sync everything to the Cloud. And dude it takes you five minutes! Maybe 10 depending on the runtime.
What I'm trying to say is, the onboarding experience is extremely fast and extremely nice. It's like Vercel or as much a Vercel kind of experience as you can get in Infrastructure as Code. And that is why I think we've been recently pretty successful, because for teams it's just a zero risk trying it out. The time to value is literally nonexistent, right? Like it's immediate. Plus, you don't really have to convince your Ops team or whoever manages access because it doesn't need any access.
Yeah. Can you do like incremental adoption with it? Can you like roll it out on like a small…
Yeah.
Okay. Very cool. Very cool.
That's the other thing to it. Terramate has a bunch of features and all of those have different benefits. It's a framework after all. But in order to adopt it, you only have to adopt the orchestration. The orchestration, all it does is it basically decides, hey, how do root models relate to each other in a sense of dependencies? Then in CI/CD mostly which of those contain any changes in the current commit branch of pull requests or range of commits compared to the main branch. Then it orchestrates those - that enables parallelism, lower blast radius, faster runtime. So it reduces your burdenment consumption too. Then there's a long tail of features.
For example, you can remove all your manually managed backend configuration and provider configuration - those hundreds of providers that you manage versions of - with code generation. That by the end of the day, what it does is it just replaces that manually generated maintain file with a file that is generated. But it's still native Terraform, which also means that whatever features of Terramate you use, if you decide to step back… if you decide to ditch Terramate, even after half a year in the game, you end up with native Terraform.
There's no lock-in so like the business risk is literally non-existing. Obviously that also means that like from a business perspective… from a business perspective you want to have a lock-in right but from a customer's perspective you do not want to do that.
I think, by the end of the day, if you do a really good job with your technology you shouldn't worry about that. You shouldn't worry about I have to have a lock-in so people don't ditch me. If you have a tool that provides value, and an engineering team that adapt, and they see where you take risk, churn is none of your concern.
I was going to say that, I didn't want to sound presumptuous or too egotistical - which I am both, I admit. Wait, is it ego if you admit it or is it confidence? I don't know, maybe… it's ego, it's ego.
I say it's ego.
It's funny you say that, because there is a lot of, there's a lot of panic… or not panic, it's valid worries about lock-in. And there is a lot of stuff that locks you in… in different clouds, not just like cloud, like we think hyper cloud or hyperscalers, but just in all things like SaaS nowadays. And it's one thing to be like locked into some one-off tool I'm using for editing my grammar or whatever, but it's another thing to be locked into something that is, like you said earlier, something that is fundamental to how your engineering team works, like the heart of how your software is released and managed.
That sucks and the thing that sucks the most is the people that get locked-in in that scenario are generally operations folk who are the most underwater on the team. It's harder for them to get out of that lock-in or move something because they don't have the time.
But you know that lock-in only ever, or mostly ever, becomes a concern when the vendor increases prices.
Yes.
It's not so much we chose the wrong technology or it doesn't work for us anymore. It's always, “Oh, you guys increased pricing and now I want to move somewhere else.” And that sucks, right? That sucks. So now obviously I can't promise that we’ll never increase our pricing. But for example, I've seen this very recently…. don't want to mention any names here like but a bunch of our competitors have…
We might be thinking of the exact same company.
Yeah. So one of our competitors, I think they doubled pricing or very close, so like a bunch of companies were coming to us… and they have a great platform, they have a great product. Like outstandingly good product, but like it's all about earning and retaining your customers’ trust at the end of the day. For example, I would be very hesitant to mark up double on an existing customer that I have a good relationship with versus maybe just do this for newly incoming customers in batches. But yeah, hard to generalize too, I guess.
I can't imagine doing that to a customer, honestly. It's funny because at the end of the day, I feel like we're also both ops engineers, right?
Kind of going back to what I saying a second ago, I feel like having a feature that is anti lock-in… like we have an anti lock-in feature. You know, I was talking to the guys from Terrateam, they have some anti lock-in features. You guys do as well, like you can walk away. And I feel like that's one of those things that helps bridge the trust with an operations engineer who typically may not have as much trust in, you know, just adopting a tool like this. Knowing that they can walk away and it's not going to be some sort of net negative on them that's going to be a big pain in the ass to move later.
The funny thing is I'd be curious, like it's also one of those things it's like you don't have to… I think when you build it, this is going back to the ego thing earlier… Like I think when you build a good product, having anti lock-in as a feature is fine because you know you built a good product.
Like we have it. It has literally never been used. And it's something we advertise, it’s something we spend time on, and it closes sales, but it's never been used. And I think like that's… I feel like if you have to depend on lock-in to have a good business, like you probably don't have a great product.
It depends. So for example, there's like the CRM if you're using HubSpot. HubSpot is an incredibly great product. Until I discovered that, you know, to put someone in the list of a sequence, you have to have an enterprise tier that costs you, I don't know, a couple of thousand bucks a month. And I'm like, “Guys, this is complete….” And like this single decision of HubSpot bugs me at night in a sense that I want to migrate away, but I can't.
I can't because it's just like I'm so locked into HubSpot that… it takes me as much effort to starting a new company to move away from HubSpot basically.
Pretty much.
So I would actually prefer them listening to their customers and like changing this, so I'm happy again like paying them whatever we pay them a month, right? Versus like lowering the satisfaction with your customer to extract more money is never a great pitch. It's never a great pitch.
Right.
And locking them in, in a sense that like they upfront have to discuss whether or not to go with your solution because it causes too much risk. It's also not a great pitch, seriously.
So we try to prevent this by… like we have some principles built in Terramate that we define throughout, like how we work with customers and how we build products and things such as non-intrusive integration, obviously in native environments. It's a list of like seven, eight points. We've been doing fairly well following those and like gaining trust of the people in a sense that like, if we are for you, then you should try us out. It's like literally quick. It doesn't take any time to onboard Terramate. But then if it's not for you - even a couple of months, a year or whatnot in - you know, you can move away without having a big headache, right?
Yeah, I really like that.
So the customers that come in, they're using the CLI tool, they're trying the open source one. Like what is the point where they're like, “Oh, we have to start using Terramate Cloud?” What is that first thing that's kind of pulling them in and that big value prop there of what they're going to get beyond just the orchestration?
Yeah, so there's two types of organizations that we talk to. I would say the one type of organization is the organization that’s like, “We see Terramate for the first time, we're interested in learning what you guys do.” So once we explain to them how this actually works, they have a hard time understanding the model because the operating model is different to what you’re used to from HashiCorp Cloud or Spacelift or any other TACOS provider. So I don't necessarily think - it's called ICP, ideal customer profile - that like those are the companies that we typically engage with just because they have to change their mental mindset and that's typically tough to do. Changing somebody's opinion on something that they're used to. They're just looking for an alternative or vendor compensation and stuff.
The pitch that works really well for us is, as you said, somebody discovered the open source, is solving a bunch of problems with it, maybe for a couple of weeks, months, or sometimes even years. Then they get to a point where they're like, so this is great, but now how could we keep an overview of everything that I'm doing here?
For example, there's one thing in Terramate that's really neat. Terramate is stateful orchestration. If you have 100 different stacks (or call it workspace or whatnot), and you orchestrate a bunch of them, but then two of those fail, naturally the dependencies will not be deployed. So within the same state file, you end up having a partially applied plan, most likely, and then all other state files are skipped. So now with Terramate stateful orchestration, you can basically pull this in and rerun only partially applied plans, only dependencies that have not been triggered and those kind of things.
Or in a sense of asset management, like keeping overview about like what infrastructure is managed by what teams, environments, projects in multiple repositories. Those are issues that occur at this scale. And with scale, I mean if you have an engineering organization with lots of hundred people and at least four to five DevOps. That's when people spend more and more time figuring those things out, debugging experiences becoming, due to the complexity of IaC and cloud, more frustrating. Management is becoming too cumbersome, and that's typically when they get in touch with us.
So we have thought about it for some time to actually enforce a cloud account to use the CLI, which I think would massively obviously drive up conversations, but I don't want to do that because I still think that you should give people the chance to get the value that the open source is promising without leaving the data somewhere.
Yeah, I love that.
Those are the conversations, as a CPO, as a product person, I'm kind of like discussing on a daily basis. But it's also how you lose trust, right? Like, oh yeah, so now you have to register to a cloud account and automatically all your data gets sent there, right?
With Terramate, it doesn't exist. Like in Terramate, you have to opt in what data you sync. Like none of your stacks are synced per default to the cloud, right? You have to opt in for that for every single operation. So you can set this on a global level in a repository, but by default it's not on.
Yeah. Now with people with open source CLI, there's no restrictions on the number of plans they can run. They can just…
No.
That's awesome. I love that.
The way that we restrict, or the way that we limit people, in the open source for example is zero restriction. But if you want to have advanced features… so for example, from a developer point of view, if you're used to Terragrunt, Terragrunt apply all and then you apply a parallel, the locks are completely random in a terminal, right? Because you use concurrency, right? Multiple threads. So they're outputted randomly, you can't read them. This Terramate, if you use the cloud, they're basically synced and streamed to Terramate Cloud, so that on per-stack basis, you can see the logs in the right order. So those are things that are just not possible on the CLI level.
So what we do is, we don't try to limit people on the CLI in the sense of functionality, but naturally, whatever is just possible, having a state machine or stateful backend is in the cloud. Or whatever is useful when operating at scale, at the bigger team, is in the cloud. But even then, you have unlimited concurrency, unlimited parallelism. Obviously, all that stuff runs on the client side, so on GitHub actions, right? But even then, like we see build minute reduction 80-90% due to a change detection, a parallel execution, and whatnot.
So to sum this up, the way that we limit in the cloud is basically the amount of resources that you can monitor in the cloud. That means that if, for example, the free tier has a couple of thousand resources included - I think it's 2,000. Everything that is older than 2000 resources, you will not be able to review anymore. But everything that comes newly in, like a new plan or so, this will be in the cloud.
There's no like, “Oh, I can't use this anymore.” It's just like if you want to have a bigger history, at some point you have to pay us money. And compared to the resource under management model that we see for HashiCorp - we don't have that. We also don't have you pay for any null resource, any data source. So it's just like the amount of data - similar to a concept that I would say Datadog is using - the amount of data that you can retain and observe and manage in Terramate Cloud, as well as the retention of the data, are both variable parameters of the plans that are available in Terramate.
Okay, so you can even use the CLI with essentially a free Terramate Cloud account and actually get some of the visibility there. That's cool.
Yeah. Yeah.
That's awesome.
So I would love to come back to AI for a minute. As you were saying earlier, the LLMs of today are not the LLMs we're going to have like five years from now. I’ve got two questions in one: like how are you personally using AI today in your work and what kind of things are you seeing people look for in the Operations space around AI?
Yeah, so I heavily utilize AI in all aspects of my life. Two examples, one is reclaim.ai, which is like this AI-driven calendar management. It automatically adjusts my appointments, books time in between meetings. For example, if I have a lunch appointment, it may not be high priority, and then a customer call is coming in, it automatically reschedules my low priority lunch appointment in favor of the demo that I'm supposed to give. It's great. It's cheap. It does its job.
Another one is Fyxer. It's a company out of the UK. They scan my entire inbox. Then they sort my emails, ditch everything that I'm not supposed to read because it's crap. And then for the stuff that I'm supposed to read and answer, they automatically draft the response using my voice, my tone, right? So I'm just using these as my private emails for now because like data security concern is something that we have to consider for business stuff. It helps me reduce my time spent on private emails, at least, so much. Equally to that, there's tools that I adopted for almost every aspect of life.
So how I use in more development centric tasks. Dude, I long ditched VS Code in favor of a AI Cursor and now I’ve moved on to Windsurf, which is basically prompt-driven development in the AI. So you can still write code manually and I do that, but like almost every single new thing that I do when it comes to stuff that I develop on our website - and I still do that sometimes, even though I shouldn't, right?
Hahahaha
Or to other prototypes, I do this with prompts. Like, I have not written a single SQL query in one and a half years, honestly. Like, it's all prompts. So it just gives me an advantage in a sense of speed, autocompletion, prompt-driven development, and whatnot.
How I see this being extremely useful for our industry?
Look, when people say LLMs are not mature enough to produce and to Terraform code, well, that's because of the scope of the problem that you give it. If you break down the scope of the problem into multiple smaller problems and then you use multiple prompts to do that - more sort of an agentic kind of workflow is the right naming here. Then with very precise instructions and prompts that you provide contacts with, examples, documentation, best practices, whatnot. The results become so much more better, but it's a matter of you knowing how to use those technologies in an advanced manner.
So now obviously everybody can go to ChatGPT or Claude or whatnot and say “Generate me the production grade code for an EKS cluster.” Dude, that's not a great benchmark, honestly. Sorry, what are we talking about?
So the impact that I see for our industry - I think right now a lot of the low hanging fruits are in anomaly detection, error explanation, like understanding what is happening better. What we are heavily going into is remediation of misconfigurations, understanding failures and having them fixed writing code - prompt driven basically.
So let's say your Terraform deployment once in a while doesn't fail because of a provider issue - yay! - but because of an actual misconfiguration that you introduced to your code. For somebody who's not very familiar with it, it's sometimes very hard to debug and hard to fix the root cause. So it's a perfect case for AI. Resolving drift, remitting drift, or importing changes, perfect case for AI.
It's always the question, is this done better in a deterministic way or can we actually generate this kind of stuff? And I think LLMs are very close to being extremely good if you use them in the right way.
For generating end-to-end templates, I don't even think it's the most exciting use case, to be honest. Like, yeah, it's exciting to have an LLM that is trained on how you like to write code, right? Obviously, most of the public LLMs are trained on public data, none of the really great IaC is public. Surprise, right?
Yeah.
But if you use this internally as an organization, you manage to train it probably and stuff, you can get a lot of great value. So I dropped a few sneak previews along the lines here.
Okay.
Those are the directions that we're going in with Terramate. And I personally find this extremely exciting. I find this space extremely exciting.
I do too. I love what you said there about like having the operator almost agentic in the middle. I feel like that's the thing that, you know, is hard about all things AI. Like if you don't know the subject matter… even just like asking a fact, right? Hopping on ChatGPT and being like, “When was Lincoln born?” It might come up with the right answer, it might not come up with the right answer, but you don't know unless you read Wikipedia and nobody's changed Wikipedia recently, right?
I've seen this with customers, like, “Oh, we're just going to generate… we're going to ChatGPT generate some Terraform.” And it's like, “Emm, like what's your experience like?”Right? You see this stuff where it's like, “Give me a production grade EKS cluster.” And then there'll just be a comment and it'll be like production grade configuration here. And it's like, yeah, you’ve got to like help it along a little bit.
But I think that's a really good analogy. It's like almost like the person acting as the agent now of like being able to look at it and understand.
I've used AI in like quite a few places like in my life now. In work where I really have seen it be useful is if I'm doing any sort of like reverse Terraform. Like generating Terraform, I tend to use AI, but it's me looking at tags from the cloud, like resources, and like talking AI through how to build a module for that and like do the state for it. It's something that I've done for years manually, and like I can just tell this thing how I like to do it now and it does it, this is great.
But I've seen other people be like, I'll just have it generated. Like it generates garbage code, right? And they're like, “I got code and it's green. Terraform apply works.” And it's like, “Eh, it works, teams not gonna be happy about it.”
You know what they call us, right? You know what they say, garbage in, garbage out.
Mm-hmm.
That's it, garbage in, garbage out.
The other place I've actually seen it, like not in IaC world - I'm very into test-driven development. That's just been my way for a very long time. I was very cowboy in my early days. I like to write some code and then test it later. I lost a bet about 15 years ago and I've been TDD ever since. But I write my tests now and I use Cursor. I'll have to try this other one you said, Windsurf?
Windsurf, yeah.
I'll have to try this. I'll write my tests and then I'll have Cursor implement the code for it. And a lot of times it'll get like pretty close. I’ll have to go in there and like do a little myself, but it's like… it put the function in, it got the method signature in, like it missed some of the logic, but like it set everything up, got all the variables in there right, and I just kind of go in and kind of tighten something up. It's like, yeah, I value my tests more than my code usually. So it's like, I'll write that and it helps, I think it helps the AI get what I want rather than me trying to like write a description, right?
I should sign up for an internship at Massdriver to finally wrap my head around test-driven development.
Oh yeah.
I do write tests, but I don't work with TDD as a first class citizen. But yeah, it's the same, man. I use it for generating code, whether that's on a test or an actual implementation level, but also to optimize code a lot. But a lot of the stuff that I do with LLMs right now is like hammering out, being productive and getting to somewhat very, very soon before I start optimizing, right? Because I do think, and especially in product, specifically when solving a problem, you always want to take the most direct and easiest path to the solution, right? And then you can start optimizing.
Yep, exactly. Awesome dude, well thanks for coming on the show.
Anytime. Been waiting for this for too long.
If you need a ride home from the airport, holler at me. We'll grab lunch.
Sören Martius, thanks for coming on the show. Check out Terramate, it's terramate.io. And what's the GitHub address for the org?
Terramate-IO/Terramate, I guess.
Okay, very cool, and we'll put that in the show notes so you can check it out.