Foundations of The Cloud With Mitchell Hashimoto, Terraform
Foundations of The Cloud With Mitchell Hashimoto, Terraform
In this third episode of the Platform Engineering Podcast’s special series on the Foundations of The Cloud, host Corey O'Daniel interviews Mitchell Hashimoto, co-founder of HashiCorp and creator of Terraform, Vault, and Nomad. They discuss the intricacies of platform engineering, the history and evolution of Terraform, the advent of infrastructure as code, and the challenges accompanying it. Mitchell also shares insights on his new project, Ghostly, a high-performance terminal emulator, and delves into how generative AI will transform operations engineering. Listen to this episode to get valuable lessons for both industry veterans and newcomers.
Thanks for tuning in to this episode of the platform Engineering podcast. I'm Cory O'Daniel, and today I have with me Mitchell Hashimoto, co-founder of HashiCorp, creator of Terraform, Vault, Nomad, and all the other tools that you probably use day in, day out. Mitchell, thanks for joining us today.
Thanks for having me. Nice to be here.
Yeah. I heard you've picked up a new certification. You did not go get your Terraform certification. You got a new certification, right?
Yeah. I wonder if I would pass the Terraform certification today. Now that you said that, I'm not sure, but I recently got a new pilot certification to fly a specific type of jet, which is really exciting.
That is pretty awesome.. Is that like your passion today now that you're not, you know, in the office day in, day out?
Kind of. I mean, yeah, I'm flying a lot more nowadays, but I think my first lesson was in 2019.
Okay.
And I did fly for a while, not for work, but as part of it. I live in Los Angeles, and, you know, we would do exec meetings and board meetings and all sorts of stuff up in San Francisco. That's where HashiCorp headquarters is and was. And I would fly myself up there for that. So I've been flying for a while, but especially now post HashiCorp, definitely the purpose of the travel is quite different.
Nice. Very cool. Have you had to do, or have you done any of those... I've watched some really freaky Reddit pilot trainings where people pull out of stalls. Have you gone through that?
Oh, yeah. You have to do all that. They look scary to, I think, the average non-pilot, but they're a part of pilot training. You do it literally your third week pilot training as a student. Now, I had to do it again in the new aircraft for the new certification. And, it's a rating technically. And I'll have to do it again, I'll have to redemonstrate it every year. So, yeah, we do it all time and it's not a scary thing to us.
Okay. I got sweaty palms literally asking the question. That shows how I'd do in a plane.
Yeah. I mean, I think that's part of it, you want to, especially as a pilot, you want to make sure that if you're in that scenario, you look out the window and you're like, this is fine. I know how to get out of this. Yeah.
Yeah. Okay, so your purpose of flying has changed a bit. So let's say you want to go to, like, Santa Barbara for a sandwich, but you're coding right now, and you've got to that sweet spot where you're like, “Oh, I figured it out.” Do you go get the sandwich that you're planning to go get, or do you finish working on what you were working on?
Flying should always be your last priority, I think. I think it's one of the things they teach you, whether you're in tech or not in tech as a pilot, is that with general aviation, you have to have the mentality that you have nowhere to be, because if you have what they call "get there-itis", that is a really big cause of accidents because you start skipping steps and rushing and forgetting things and making mistakes. You don't want to be that person. So, really, it's like, it's a luxury to be able to go, and you want to make sure you take due diligence, take time. So if I was heads down and I was just about to solve a problem, I would probably just either cancel the trip or just delay it significantly and eat lunch at home and go for the afternoon or something. So you don't have a need to be there.
Get an afternoon coffee in Santa Barbara is what we're talking about.
Yes.
Awesome. Very cool. So I think we're coming up on, or if not, when this airs, it'll be the ten year anniversary of Terraform.
Wow.
Yeah.
That's wild. That's wild.
It's been a minute.
I can't believe it. Yeah, that's crazy. I remember when it was the ten year anniversary of Vagrant. I don't know when that was now. I mean, that was probably, like, five years ago now or something, but that was a weird feeling. So like, Terraform feels so much newer to me than that that it is weird to think that it's been ten years.
Have any recruiters hit you up seeking twelve years or more experience of Terraform?
All the time, all the time. My LinkedIn is constantly filled with job opportunities with Hashicorp products needing longer experience than this existed.
That's got to be funny or annoying.
Super funny. They're all canned emails, right? But I love how innocent the emails are. Like, “You seem like you might have experience with Terraform”. It's like, I might.
Yeah, I may have stood something up once or twice.
Yeah, yeah.
Awesome. So, can you take us kind of like back in the early days of creating Terraform? What were the key challenges at the time? Like, we had a couple other tools in the space. There was Ansible, there's CFEngine, Chef. What was going on in the cloud that made you reach for designing a new type of tool?
There's various ways to answer this question, but I think coming from the position of Chef and Puppet and Ansible and so on is the right way to think about it. I was a heavy user of all of those at some point. Different companies I worked for and stuff. I used Chef for a long time. I used Puppet for a long time. Ansible briefly. And the issue I had was that cloud was sort of blowing up and, more generally, this idea of API driven infrastructure was blowing up. You had more than just one server. You could click and get one server, and it's not bad, but now you had a server, security groups, EBS, load balancers, all sorts of little resources, and all these other tools that seemed like they would fit. Like Chef and Puppet, they were all very focused on a single server. And one way I like to describe it is the nouns of the project. You can figure out the focus of a project based on the nouns that it creates or uses. And the nouns of Chef and Puppet were all around, like services, files, packages, right? Like, that was not the right granularity and they were trying. Chef had something called Chef Metal, Puppet had something called Puppet AWS, I think. And there were others. There was a Docker machine that was about to come out. Terraform predated that just by a little bit, like months and stuff like that. So I just saw a need for a tool to act a lot like Chef and Puppet, like spiritually very similar, but work on the outside of the server part. These resources were more like infrastructure components rather than packages.
Yeah. And that started to become like a real need at that point in time. That's when the way we use the cloud changed a bit at that point.
[Baby cries]
Who is that? Do we have another special guest?
That's my ten month old daughter. She's going to sleep soon, but she gets really excited whenever I'm talking to a camera.
How was the first Father's day?
I kind of missed it. We had to travel on Father's day, I was with the family, but we couldn't really do anything special for it. And that's okay. Like, I didn't need to do anything special for it. It's all good. Every day is just a crazy reminder that this little person is in my life.
They are so much fun. Okay, back to Terraform nerds.
Yes.
So, like, at this point in time when Terraform was coming about, like, this is also the time where I feel like a lot of the way we kind of use the cloud changed a bit. Like, most people at the time were using the cloud as a way to run a workload. But at this point, that's where you start to see your applications composed of the cloud. You're starting to use step functions and Lambdas, SQS, SNS, like a lot of our software is starting to be composed of different cloud services. And I think that's one of the things that made Terraform a bit, I guess, easier to use, maybe than some of these other tools because of the way that it was designed. Also, I feel like it created an opportunity for us to start thinking about infrastructure as code, a bit wider than just the operations team, which is where a lot of that thought and a lot of that DevOps effort was happening. Did you see many application developers starting to lean towards Terraform at that point in time, or was it still very ops-centric?
It felt pretty Ops centric, except for the hobbyist type person or full stack type person that was doing everything from writing the code to deploying it. But I would say in terms of businesses and that sort of adoption, it felt pretty Ops-centric. But I think you're exactly right that before all you really needed to run an application, deploy an application, was SSH information. You needed a server and you needed SSH information and probably like a public IP address or something. You're exactly right that it turned into you needing many little pieces. I don't see them as much anymore, but for a long time those like isometric AWS diagrams were really popular where you were kind of looking at these boxes and arrows and like there was a load balancer and the arrows went back to EC2 instances, and they went back to like RDS and all sorts of stuff like that. You would see this complicated thing and you would be thinking about it and it would be like WordPress. That's the cloud version of WordPress. I think back and I think of that, there's pros and cons to that whole thing. We don't need to get into that. But if you look at that, that is what really definitely rocket shipped Terraform or infrastructure as code in general. You said something a bit earlier too, which is that at that time a bunch of people were starting to need something like Terraform. And another piece of evidence to that is like we were creating Terraform out of recognizing this need, but so were a bunch of other people. So I mentioned Chef Metal, Puppet AWS, and Docker machine. There were a bunch of other competitors to Terraform that were at different points in time more popular than Terraform, right. Because Terraform wasn't first to market. It wasn't. It was like the fourth. Depending what you count as a true competitor, it was like fourth. So, a bunch of other people were also figuring this out at the same time.
Yeah. And then one thing that's pretty unique to Terraform versus many of the other tools at the time is the state file. So that kind of separates it from Bicep and Cloudformation and whatever Google's is currently called. What were the technical challenges that drove you towards the idea of the state file?
Well, I would disagree with you because I don't think the state file is unique. A lot of these, not everyone, but almost all these infrastructure as code tools have a state file. They just hide it for you because they're server side and they have a database. So Cloudformation absolutely has state. It's just stored in a database that you can't see. One of the things I wanted when we were first designing Terraform was I wanted you to be able to run it on your own computer without having to pay for a service. You could own all your data and run it by yourself. That was philosophically something I wanted, but also we were a small company and I didn't want to maintain a service at the time. But in order to do that, you still need the database. And so the state file was basically the database that a SaaS service would have hid from you, but it was now on your machine and you can move it around and do anything you like about it. People don't like it because it is extra complexity that I'd rather not have and in early versions of Terraform I did try to magic away the state file. Like, I tried to use runtime reflection to figure out what resources you have, but you quickly run into... and this is all online. Like, if you google "Why Terraform state file?" and there's a documentation page. I wrote most of it, it's been edited since then. It explains why this exists. You run into issues like you can't figure out what Terraform created versus what a non-Terraform tool created. Or it's hard to figure out what your instance of Terraform created versus what someone else's instance of Terraform created. And then also, I think the big one that everyone takes for granted nowadays, because most tools handle this now, is removing resources. So if you delete something in the config file and you run Terraform apply, it'll delete it. But ten years ago, Chef, Puppet, none of them did that. None of them did that. Ansible. Like, none of them did it. So I like telling this funny story where one of the first companies I talked to about Terraform, I went to a board meeting room with their whole infrastructure team and I was trying to pitch them on this Terraform idea. Wasn't super popular yet, but they were receptive. The first question I got, someone raised their hand and said, "If I delete this thing from the config that you're showing me, will it delete the EC2 instance?" I said yes and everyone in the room cheered and clapped. That's how special and unique that was ten years ago. Now we just take it for granted but that's only possible because of the state file.
Yeah, I remember a company I joined in the early 2010s-ish. They were fully doing "infrastructure as code". I threw air quotes on that. One of the things that happened when I joined... they were using Ansible, if I remember correctly... they had a cloud bill problem. The team didn't realize that it didn't delete it. And so they saw this infrastructure that they thought they'd removed over time because they'd inherited this code and they were moving resources. And we get in and we're looking through everything and we're like, "What is all this stuff, that's not infrastructure as code over here?" And they're like, "What do you mean? Everything's in infrastructure as code." And I was like, "This isn't." And they're like, "We deleted that." And I was like, "When?" But yeah, that is one of the things you just kind of take for granted after you've used the tool for a while that, you know, just remove it and it's truly declarative.
Yep. It's not really a segue but there's a funny story in there about you being billed by something that you thought you deleted. I'm a big proponent of automated testing, I always have been. And Terraform has a suite of pretty decent automated tests. And we would run this end-to-end suite every night that would actually create instances and tear them down. And it would do hundreds or thousands depending what it was. And I won't shame anyone because we found control plane bugs in every single cloud provider or SaaS of the time because none of them were used to this very rapid create destroy update. People were used to create, wait a minute, maybe a few updates, wait a while, destroy a few minutes later. But we were just doing it within the same two seconds - create, destroy. In AWS, in particular, we found this bug where we could destroy an instance and if you did it super fast, sometimes because of erase conditions, it would be gone... it wouldn't be in your account at all, it wouldn't be present at all... but you would still be getting billed for this thing. We had this bill that we looked at and they had EC2 instance IDs in there and they didn't exist. We emailed support. First line support has no idea really what's going on. First line support is like, "Yeah, they don't exist, but I do see them in your bill." So we just escalated. We weren't big enough to have direct partnership contacts yet but eventually we got someone who realized, “Oh, it is running, but it got deleted from your account. But the billing system still sees it”. I don't even know, something went wrong. We had all sorts of stuff like that that was fun to catch early on.
That is a scale that is probably pretty interesting. That is awesome. So we're in ten years now of Terraform being out. I think it was the State of CD report last year, found that only 27% to 30% of respondents were using infrastructure as code. I would love to know when you were first in that boardroom talking to those operations engineers, infrastructure engineers, you know other similar scenarios like what were the biggest challenges to getting IaC adopted at the time? And then what do you think is still keeping us from getting closer to that 100% adoption rate today?
I think there's multiple things. One's kind of funny... it's hard to believe now because I think this has been generally accepted now, but 10 years ago, to 15 years ago with other tools that I used, like the word ‘code’ and ‘infrastructures code’ was problematic because a lot of these people that are historically operators - think sysadmins, think DBAs, think network engineers, that sort of role - like 15 years ago those people weren't coders. They didn't view themselves as coders. They were absolutely professionals and really good at what they did and invaluable but they were not coders. And I'm not putting that on them. They would tell me, "I'm not a coder, I don't want to write code, I'm happy doing what I'm doing.” They would write config files and stuff, which we might consider code. But they were like, “No, no, no, I'm not writing code. I'm happy clicking, seeing my UI with arrows and boxes” and things like that. And so a lot of the pushback early on, before we had this prevalence of infrastructure people that knew how to program, was that they didn't want to write code. So that's one thing I think that's dissipated quite significantly, because I think across careers more people know how to code and are comfortable with the idea of code. The other thing was really more of an immaturity of software problem. Immaturity of tooling at the time. We had this problem, but everyone else had this problem. We had all these fancy new infrastructure tools that solved this problem in a demo type environment or small scale type environment, but we didn't have the ecosystem around it or maturity in the core software to show people how to do that at multi-hundred or thousand person safety and scale. Like how do you do policy on this stuff? How do you report what you're getting? One of the big fears of infrastructure's code at the time was that it was so powerful that how do you not get runaway bills? How do you not get people spinning up a ton of stuff? And again, even at that time, AWS had limited policy systems to even protect against that. Today, I don't even understand AWS policy systems because they're so complicated and intense, but then they just didn't really exist that much. I think those are the technical side and the human side of what was holding things back.
What do you think is keeping us back today? Because that number's still like, it's surprisingly low to me. And I know that a survey doesn't necessarily represent reality, but, I still meet companies pretty frequently that are along in their growth journey, you know, series A plus, and they've kind of done click ops for years and they've kind of racked up this technical debt. Like what do you think it is that stops those organizations from investing earlier in automation? And is there a benefit to investing early in automation?
I think the word you use is perfect. It's an investment and some people are going to make the choice that it's not worth it. And I don't think they're necessarily wrong. I think that there's a school of thought that you should do everything - you know, build a strong foundation so in the future you're better off, right. But there's also the school of thought that you should just do what you need to do to get stuff done and build technical debt. And that's okay. You just repay it over time and you kind of go back and forth. Yeah, I mean, I think that that's a big reason to do it. I think another one, besides consciously taking shortcuts, is just that I don't think it's that easy still. So as much as we have better training systems and more resources like books and docs and screencasts and all sorts of stuff like that, I think that it's still a lot of work. Sometimes clicking stuff in a UI is still easier. And even to this day, I don't see great, really high level, easy tools around... any infrastructure as code tool that really simplifies the problem. Yes, Terraform, for example, has a registry with modules and there's things like that. It's still not the easiest thing to get going. It's easier than writing bare Terraform, but I think that'll hold it back. And that really takes like a multi-decade type of process, to me, to reach that point. That's what it feels like with anything that I've used. Yeah, I mean, I sort of feel that way for now as a sort of, aside from infrastructure, I sort of feel that way today about like photo editing, for example. Like photo editing 15 years ago was like, you had to figure out Photoshop basically, and it was a power tool and it was hard.
I'm still figuring it out.
Well, I don't know how to use it because I didn't have to. Because eventually, things like the Photos app, like just built into the iPhone or things like a preview on your Mac, these things just had enough of the tools that you just like click a button and get exactly what you want, that we're now using those core technologies that photo editors used like 15 years ago. I think that's it with infrastructure as code. We're in sort of like the Photoshop era of infrastructure as code, and it has to get simpler to reach 100%.
So when you look at the cloud, like any given cloud service, AWS, GCP, Azure, I guess the big three, I don't know that their goals have ever been ‘to be easy’, right? It's been ‘to be capable’. They want us to be able to sell their software to as many organizations as possible. Do you think that the tooling hasn't made it easy enough? Or do you think that the cloud is outpacing the tooling and how complex it's gotten?
I wouldn't point fingers at any specific person. I think it's the whole stack. Everyone's going to always be moving forward and shipping more complexity and more capability. And hopefully it's more capability and that's why there's complexity. Hopefully it's not just raw complexity! But I think you're always going to be moving that path forward and then you have this layer cake of abstraction. Everybody has it.
I think another big part of it is that cloud providers have done this great job giving you all these building blocks, but if you really adopt all these building blocks, your bill is outrageous. Like, that's what it feels like, at least in my opinion. So how do you enable better tooling on top? Everyone needs to take a cut, right, to be sustainable. And how do you even enable better tooling on top when, like, the base layer is already very expensive. I don't know how you get there. I think it's sort of a failure. For example, you see a lot of new PaaS coming up - That's not a failure. I'm really excited a lot of new PaaS are coming up, like Fly, and Vercel, and things like that - But I think the failure is that a lot of them are building out their own data centers and own hardware and things. They're not building as much on the cloud. They start on the cloud and feel like they have to move over. And I feel like that's sort of a failure because it's really due to partly technical, like owning your own uptime, but also pricing. They can't offer competitive pricing because it would just be very expensive. And I think if you view cloud as the true building block of what the future should be, it's not going to be, if it's going to be this expensive. Right? People are going to hide the cloud from you on their own hardware, which I don't think is a win for any of the cloud platforms.
Yeah, that's interesting. I do feel like there has been just an absolute explosion of PaaS in the past few years. I feel like almost every week you're seeing another launch. And it's funny because then they end up having their own levels of automation. A lot of them have their own Terraform providers, so you can kind of automate your PaaS environment.
At launch of Terraform 0.1, one of the providers was a Heroku provider.
Was it?
Yeah, I think it was AWS, DigitalOcean and Heroku. I don't know if I got that exactly right, but I think those were the three at 0.1 that we had.
And there's a lot of PaaS. What's interesting is that Heroku moment,right? If you don't predate Heroku as a developer, you might not recognize how magical that experience was. But when we were just managing stuff on EC2 instances, like classic EC2 , and we had Slicehost, we were syncing files into place, right? When Heroku first came out,that felt like the moment that almost every organization got the real taste of DevOps. It was just so easy to develop an application and get it into the cloud or into Pass/Cloud. What are your thoughts today on organizations that are going towards the cloud earlier? Can you build a business on top of a PaaS and stay there?
Yes, absolutely. I mean, one of the reasons I love no longer being at HashiCorp is I can be a little bit more, I would say, controversial about my view on cloud providers. You know, I had to be really careful for a while because we were partners. I didn't want to piss anyone off. I'm not rude to them. But I think one of the things, for example, that I would have been hesitant saying that I don't mind saying at all right now, is I think that the cloud providers - one of the best things for them but worst things for us as consumers is it feels like they've tricked us into believing that we're all too dumb to operate our own infrastructure, at any scale. I think that's where the issue is. One, it does feel condescending. I've always felt that cloud providers have felt a little bit condescending. But the issue is that at any scale. I think that realistically, startup companies, businesses just getting started... whether it's PaaS, which is fantastic, absolutely love PaaS. I always started everything on Heroku. HashiCorp started hosting all their stuff on Heroku. So like, totally fine... But I think that also running your own infrastructure is not that hard at small scale if you're willing to accept that, yes, you might have less uptime, your latency globally might be a little bit worse, things like that. But these trade offs for cost or for cognitive simplicity or something, might very much be worth it. I just think back to... even 15 years ago, not that long ago... some of the first companies I worked for that I saw these startup companies go from startup to early/mid-stage acquisition (Over $10 million acquisition so not huge but pretty good) all running on like three box LAMP stacks, right? Like really simple stuff. Again, there's cons to that. Like, absolutely, doing maintenance on your own database is way worse than buying an RDS instance. Way worse but I think that the trick is that we shouldn't think that it's an absolute truth that we have to go one way or the other.
Yeah, I love that. I feel like a lot of organizations feel like they get to the point where there's a maturity level and it's like, "Oh, we need to not run on this PaaS anymore." And you can get very far on platform as a service.
Very far.
And I think you can really change the velocity of your team a bit by maybe deciding to stay. I've seen a number of organizations that go, "We're moving off Heroku. It's gotten expensive." And it's like, okay, well, you don't have any cloud experience. How many DevOps engineers are you going to hire? You know, two. That's probably more than your Heroku bill, right?
Yep.
So it's very interesting that there's so many of them have been cropping up, and it's interesting to see them in like, the different spaces, whether it's, you know, the front end ones like Vercel, or, you know, additional ones like Railway and Fly and some of the ML based ones. I think it is a pretty exciting time.
It's awesome because we went through - it felt like a decade, but it probably wasn't quite a decade - we went through like five years for sure of where there was zero PaaS investment, PaaS innovation. It was Heroku which felt like it was sort of dying a slow death and then nothing picking it up, right. So now, I'm excited to see all this stuff. I've tried all of them. I use some of them formy personal sites, it's the way to go, for sure.
Personal sites. Let's talk about personal sites, because I hear here you've got a new project that you're working on. So you've got Ghostty. So can you tell us a bit about Ghostty?
Yeah, it's a terminal emulator. It's something that I think a lot of people think of as a finished technology. A lot of people are surprised I'm working on a terminal emulator, and I don't want to bore people but, as I got into it, I realized there was a lot of room for innovation. There was a lot of room to make things more performant. There was a lot of room to add features. There was a lot of room to make it work better cross platform. There was all this opportunity.
I set out originally starting this terminal emulator, just as anybody else doing a side project, just to learn. I never thought I would ship it. It was just to learn a bunch of stuff. GPU programming, some low level internals, terminal emulator, internals, stuff like that. But as I started building and using it, I surprised myself that I felt for me, I was like, wait, this is just better than what I previously used. And so I just kept investing and shared it with some friends, and they thought it was better. So now, at this point, it's something like a thousand beta testers at the time we're recording this and it's going great.
I feel like a terminal emulator is interesting. I spend almost my entire day in one, but at the same time, I would not even know where to get started to start building one. What was the first, like, big challenge in taking on a project like that? And then, when was the, “Oh, this is happening” moment?
Yeah. Finding the unknown is always challenging because I had the same thing. I knew the basics of terminal, and, like, I knew there was a pseudo, like, tty thing, and I actually knew how to get one of those, but I didn't know what to do with it. There's various layers I didn't really understand. So figuring that out took some time. But the joke I like to tell people is that it took about six months for my terminal emulator to be really, like, production use - that you could replace it for day to day work usable. It took about six months. And I joke that for about one of those months, I was learning GPU programming. For another one of those months, I was writing a terminal emulator. And for the other four months, I was writing a font engine, because the hardest part of doing the terminal emulator probably has been fonts, font stack, font rendering, font discovery, everything.
Are you a kerning wizard?
Unfortunately, it feels that way today, yes.
Oh, that is funny. So you said it felt like there was lots of room for improvement. So, for people that haven't gotten to try Ghostty yet, what are some of the areas that you found that were interesting to focus on adding functionality?
So, the first one that people usually feel is how fast it is. When I tell people how fast Ghostty is, they usually think, "I've never felt my terminal emulator is slow. It feels fine to me every day." But it's one of those things where... the analogy I make is when you went from a regular screen to a high DPI or retina screen. I never felt like I could see pixels or felt that bad. But then you see retina and you're like, “Oh, that looks pretty good.” But then you go back and you're like “Wow, that is really bad.” Like the old one is like grainy and nasty. And that's what people have described it as. It's like you start using it, everything is just, I wouldn't even say snappy, everything is instant. Like you used to cat a file and see some stuff scroll. Now, you cat a file and the end is just on your screen. Like, all the waiting is gone for everything. So, I think that feels different. It feels more productive in a sense. Does it really matter? I don't know, but it feels good. I think that's one thing we innovated on. The other thing is just being a true native experience. It's a real Swift based Mac app, but also it works on Linux with GTK because the core is cross platform. But all the UI stuff I wrote a native code for the platform that I'm working on.
Very cool
Being able to get this cross platform tool, your config file shares across multiple operating systems, if you're SSHing and stuff like that, but you get native macOS tabs. If you're on Linux and you have a windowing system, you get a native GTK experience that integrates with the GTK status bar. Like you have all these little details that I feel like no one else in a cross platform way has really done. And I think that's good.
There's a bunch of other ideas of newer features I want to do that I haven't really worked on yet, because my goal with 1.0 is really to make the best existing terminal emulator. But those are some of the examples and some of the features you may not know that you don't have. A lot of people use Mac's terminal built app. And whether you use Ghostty or not, I just implore everyone to stop using the built-in terminal. For example, the built-in terminal is still limited to 256 colors. Again, you may not realize this because everything degrades to supporting that, but as soon as you could actually use full 32 bit color, like RGB - actually with transparency, so RGBA - like on everything, everything just looks a lot richer. Like suddenly you start Neovim and because it detects the 32 bit color, everything looks better. And the example I give, that at least older people usually remember, is when Windows made the 16 to 32 bit shift. It's like when you started it you used to see the gradient, you used to see the blues in the Windows like start, you used to see each individual color. And then like you got a 32 bit capable computer and then it was just a smooth gradient. It's like that in a terminal is possible and it's been possible for like 15 years, but nobody works on the built-in Mac terminal. Like it hasn't changed in a decade. So please, just use anything else... stuff like that.
It's not open source yet, but can people that aren't on the team that are working on it, get access to a binary?
Yeah, so it's 1000 beta testers. You can't quite just always get access to binary. But if you join the Discord - the Discord is about like 5000 people right now - almost a quarter have access. And I invite about five to ten people every day. If you actively work on a terminal based application or have some other expertise, like you've worked on a game and have GPU experience or font rendering experience... you get invited instantly. I love to have those types of people involved. So just join. But I do plan on releasing it 1.0, open source, within the calendar year 2024.
Very cool. And so you haven't open sourced this yet. What's different about it? Like why haven't you open sourced?
So everyone in the beta has source access. So there's nobody that just has binary access. Everybody has source access if you have access to it. The main reason I haven't done it is work-life balance. So I have experience starting open source projects and I don't want to get inundated with issues and things like that. That's one part of it. But the other part of it is that I want to build sort of the right community culture around it. And if you just big bang something that gets some attention, then you kind of lose control over what that community really is. Like the community itself defines it really quickly before you get a chance to mold it in some way. So that's a big part it, getting the right culture involved. And then the third part of it is really just about that first experience for people. I want it to be really great. People say this and I truly believe it, that you only really get one first impression. And so I want people to download a terminal and it works great, because a terminal is close categorically to one of those things that if it doesn't work, like if it has bugs, then it really doesn't work for you because you have to work in it every day.
Yeah, it's a commitment.
Yeah, it's almost binary in the sense that it either works for you or it doesn't work for you. It either has the features you need or you can't use it at all. There's no sort of in-between, or it's a very narrow in-between. And so releasing an early version that someone came, tried it, said oh, this isn't good for whatever reason, whether it's performance or features and then just forgetting about it and then having this preconceived notion for years that it's not that great of a terminal. I'd rather them wait a few years to see it for the first time and say, "This is pretty good." And be able to use it.
Yeah, I feel like that happens with every new editor that comes out. It's like I'll try it, I'll be like, “eh”. And then it'll be like a few years later and I see like everybody I'm looking at is using VS Code so maybe I should download the thing again.
You know, terminal emulators are like old and kludgy pieces of software. There's sort of like an ode to that that I'm doing with this development process too, which is that... I think people will laugh because HashiCorp is like one of the big top examples of a zero ver type of company. Like our projects were zero point something for so long... And I kind of want to like, not go all the way back to the dark times, but I want to go back to where like the first version that ever comes out of Ghostty is going to be a 1.0. Metaphorically, if you were to walk into like a CompUSA and it was boxed, like, it would be ready for daily work. There's no alpha, there's no beta. Like the first public release is going to be ready. And I think that too many people release software early that just isn't ready for usage. And I was one of those people, right? And I just want to, this is an ode to that. Like I want to get back to stable releases. Like the first public release is a stable release. So in a sense I'm doing a beta program, obviously, but it's closed. And in a sense this beta program to me is the QA that you would have done in a traditional company. And when everyone gets to see it, it's going to be ready.
That is very cool. I'm going to be downloading it that day because I've been in iTerm2, which I assume is the 2.0 of iTerm. I have no idea for a very long time, so I'll give it a whirl.
Nice.
So what are you developing it in?
The cross platform core is in Zig, and then, like I said, all the native UI elements are in native technologies. So, for Linux that's still Zig, because GTK has a C API and Zig interacts with C just natively. But for Mac, it's actually a native Swift API built in Xcode and all that.
Was this your first time working in Swift and Zig?
First time working in Zig. I had done a couple side projects in Swift, but I think more importantly, I did a bunch of Mac objective C work like 15 years ago. Actually, even on a contract basis, I got paid to do a few Mac apps. So I was pretty familiar with the foundations of Mac development, just had to catch up on all the new stuff.
Yeah. So what was attractive about Zig? You've been doing Go development for a fairly long time. What kind of things brought you towards Zig instead of building it in Golang?
So I think Go is great for the right use case. I would say that to every language. I think every language has its good use case, but then you should look at other options depending on what you need. I think people sometimes try to cram too much, like every language needs every feature, because it needs to be general purpose for every problem. And I don't share that point of view. And I felt with a terminal emulator, I was going to be interacting with a GPU directly, high performance an explicit goal, and not just high performance, but as close to optimal as I could get in certain places. And so I wanted something that gave me direct control of everything and so Go's runtime was sort of like a non-starter for me in terms of making things work. And so that was the reason I looked at Zig.
I think every language has its good use case, but then you should look at other options depending on what you need.
So what are some of the features of the language that you love?
So the biggest one, which I thought coming into it would be a gimmick, I actually thought it was just like a marketing gimmick is comp time. So comp time is Zig's ability to execute Zig code at compile time rather than runtime. I really thought this was going to be a constrained... like the demos they gave. I was like, it's probably the only use cases that ever exist. But really it's super powerful for two reasons. One, it's their generic system. The way they do generics is that types could be passed in as compile time parameters. So you could, for example, have a list function that takes the element type as the parameter and returns a new type created as part of the function. And you have the full language available in order to build this new type. And so that's how their full generic system works. That's super powerful. It's using a standard lib, but I use it constantly. The other side is anytime you would have, for example, coming from Go, anytime you would have used Go generate to generate a Go file or data or anything, gone. And more in the C or other languages, anytime you would have used Python or Bash or something to pre-generate a data table or something, gone. You could use one language for all of that and do it directly in the code. So that's used all over the place in the terminal. I pre build, for example, like keyboard input codes, USB codes, all the colors that are possible, things like that are all just compile time created, and then it's just data that's quick to look up. I think that actually the best example from a practical performance standpoint was... When you type a character, the letter ‘a’, or a Japanese symbol or an emoji, you have to know how many cells it's supposed to take, one or two or anything. And the total amount of Unicode characters, code points, there's about 1.1 million... and I was able to just use comp time to pre generate a lookup table and compact it in the memory for all 1.1 million code points. And that table fits in an L1 Cache. So it's really, really fast. And it was about five times faster than doing the runtime data computed version of the code point but I didn't have to write a complicated script for that. It's just a Zig function that happens to precompute its results at compile time instead. So I think that's the most attractive thing. There's a ton of other cool stuff in the language, but I think that one is the most unique.
That is very cool. That is very cool. I definitely have to check Zig out. I've seen a couple of projects popping up in it and checked out its homepage recently. But I'm in that place where I'm like, "What can I build with this?" And I guess you thought a tty.
Yeah, I think a weak point of Zig right now, just due to the ecosystem... not the language, the language is general purpose... but the ecosystem is networking. So it's not very good for web services and things like that. It might never be the right choice for that, but right now, for example, you just have a lot in the way of even making that work. But for something like desktop software or just exporting a C library for some high performance operation, it's really, really good at that.
If we switch back to the cloud for a bit, what I'd love to know is, looking ahead, what do you think are some of the bigger challenges that we're going to be facing in infrastructure as code and cloud with some of the changes that are happening around ML and AI? Do you think there will be a big shift there of the way that we do our work as operations engineers or developers, or are you more bearish on the idea?
So I think there's almost two questions you asked there, which is one about, like, challenges in cloud in general. But then you asked about ML specifically. So I'll talk about the ML side. I think ML as a paradigm... or rather than ML, I'd say like Generative AI as a paradigm is here to stay. So I think that's going to impact everything that we do in terms of the way we work. Like I use Copilot now, and I think that in any type of file, whether it's Terraform file or your actual app code, that's going to become a really critical thing and you're going to have to learn to work with it or work around it or however you look at it. But then I think on the other side, like, I think consumers asking for AI based functionality is also going to keep happening. So it'll impact us on both sides. So for sure.
I'd say that generative AI as a paradigm is here to stay. So I think that's going to impact everything that we do in terms of the way we work.
As far as DevOps and operations engineers, how do you think this will affect what we do? I feel like when you look at a lot of the GenAI out there today, you'll see it work very well for maybe creating a sorting routine, when you start getting towards things that are very business logic, you can see it kind of fall apart. But when you think about operations, I feel like... let's say that we trained AI on just the most immaculate Terraform code base there is. It's still missing half the picture, which is how this stuff actually operates. How it runs in the cloud. The metrics. What your user patterns are. How do you think that AI will affect or GenAI will affect operations engineers in the near term? Like, do you think it's going to have the same impacts that it's had on developers and helping them write code, or do you think it's still too far away, just kind of how the context is missing?
Yeah, I mean, I think that GenAI today, as impressive as it is, is still extremely stupid. But I do think that'll continue to rapidly change and improve. Like, I think it would be very naive to dismiss it, right? To just say, "Oh, it's always going to be stupid, always going to be useless." I think that could potentially be a very critical error. So I'm more bullish than that. But at the same time, I don't think it's sort of the magic pixie dust today that a lot of people paint it out to be.
I've found it to be an invaluable tool with a human in the loop. Check and massage repeatedly. Like just that loop. So it really feels like every day as a developer, for example, with Copilot, it's really about writing some code, knowing how to get Copilot to emit what I want it to, checking what it does, and then just doing that on a multiple times per minute basis. I'm not the type of person that tries to put in the perfect prompt and have Copilot generate a full function, even at that scale. I'm more the idea with generative AI and development, where you're really trying to get it to generate the next few lines properly and you kind of get... I've felt at least using it for the past couple years, that I feel pretty comfortable knowing how to get it to generate those right things. And I kind of know in my mind what I want it to generate. So I'm not doing stuff that I don't understand, it's stuff I understand. I'm just doing it to save myself time, save myself keystrokes. Like small aside, I think GenAI is going to be huge for lowering RSI and things like that because I'm just typing way less. I'm more typing just enough to get the rest of the three lines perfect. So I'm not doing as much. And so I think it's the same thing with infrastructure. I think that autocomplete on one side is one way to look at it, but I think the other way to look at it is anomaly detection and remediation suggestion, I think, is going to be huge. So I think automatic remediation, pretty bad idea with the current state of GenAI being, like I said, pretty stupid. But at the same time, I think something that's looking at this mass of information and metrics and just pointing out anomalies... We used to try to solve that with just statistics and human eyeballs looking at different things. You know, how do we store all this data in a small amount of space and draw it in the right kind of graph that makes sense to people? That's still going to be important, but I think less so because something can be looking at it and say, "This doesn't look right. I would do this. Does that make sense to you?" And just having a human come in and see if they need to take any action or tell them, "Nope, you were wrong." And stuff like that, I think that'll be really good. There was one startup company, I was just talking to their founder, and they're doing stuff with Terraform where they connect to your observability platform like Datadog, and then they connect to Terraform cloud and GitHub and all those resources and whenever they notice an anomaly in the metrics, they try to use AI in order to correlate if infrastructure as code caused that, or if it didn't cause it still, is there a change in the infrastructure as code you can make? They're trying to actually get to the point where they'll make a PR to suggest a potential thing, but their whole thing is still like, it's a PR, you still should think about it, understand, before you merge it. But I think stuff like that is only going to get better.
Yeah, this is what I've hoped is it's going to make us... we're still going to very much do our jobs, we're going to do the part that we love, we like writing code... but it's just going to make us more efficient at it. We don't have to have a DevOps panic that we're not going to have jobs anymore. It's going to make us more efficient. And you know, it's funny, I saw your tweet a few weeks back about being mean to AI. I am also very mean to AI, but I also use it every single day. So I use a tool called Mods. It's like a CLI tool for making prompts to just the different AI tools that are out there and I have it incorporated in like a number of my scripts. Like, I'll actually pipe stuff through ChatGPT, have it generate some code and then pipe it into something else. And I felt very similar... I didn't attribute it to RSI yet, which is funny because I actually get tendinitis really bad. But it is interesting. I'm curious if there'll be a dip in tendinitis ten years out as we're using more AI.
I hope so. Yeah,
I just saw a talk. I was just at Systems Distributed conference and I just saw an opening talk and that speaker, said something about GenAI. It wasn't about GenAI at all, but like a part of it was. And I was super nodding my head because what he said was, basically, "programming is going to be commoditized, but engineering will always be valuable." And I think that little nugget is exactly right. The act of writing code, even today, it's a stupid task. Like a lot of times engineers in their head know exactly what they want. It's just you have to type all these words to get it out and you know exactly what you want to type. It's just going to take you the next, you know, seconds to minutes to type it all out. If we could get to the point where our thinking could map to the computer to just give us the right thing, like we're doing the engineering, we're picking the right abstractions, the right abstraction level, the right algorithmic choices, the right data layouts, things like that, and we're just having the computer do the programming for us. I think that's exactly the type of future that I see being possible with this. That's sort of how I feel already on a very basic level with Copilot.
Yeah. Is that from a talk or was that just a conversation?
No, that was from a talk. It was the Day 2 opening talk for Systems Distributed 2024. So I forgot the speaker's name and I forgot the talk title, but it was the first one. The whole talk was great and it wasn't about GenAI, that was just a part of it. The whole talk was about “What is systems engineering?” And the conclusion, not to bury the lead or give anything away, but the conclusion really was you could be a systems engineer in JavaScript, for example, because systems engineering is not about what layer you're programming at or what level of performance you're programming, it's about the engineer's ability to think through and understand a problem and solve it at the right layer. And so that was sort of the whole thing about it. And it was a great talk.
Yeah, I really like that quote. I'll make sure to include that in the, in the show notes. We are coming up on time and I want to be, you know, respectful of your time and I really appreciate the hour or 2 hours you've given me. So we actually recorded this once and lost all the audio, so we're doing it again.
You've been at the forefront of infrastructure as code and DevOps for a very long time. What are your hopes for the future of this field?
I think we touched on it, but I think simplicity. I would say that generally, simplicity in the technical side, simplicity in the process side and the billing side. I think that we've built so much capability via complexity for so long at this point. And I do say we because I'm not excluding myself from this. I would love to see a little more simplicity or challenge alot of these thoughts. I think that's one side. I think the other side is not accepting anything as the forever solution. I mean I think that, there's a Steve Jobs quote and I just don't remember the quote, but the idea behind the quote was basically like if you're not putting yourself out of business, someone else will. And that applies to everyone. You could pick any of the biggest, most prevalent... I'll pick on myself with Terraform, but I'll look at Kubernetes, I'll look at AWS with cloud providing... I mean at any layer pick, pick a technology that feels like nothing could ever displace it and give up on that thought because I think the best innovation comes from the audacious thought that what if this hyper prevalent thing no longer existed. You know, think through that challenge. And I think that ties in with the simplicity thing because I think if we try to iterate simplicity on top of everything that has been built in the past 15 years, it's not going to be that simple. So I think there has to be some critical challenge of different parts of the stack. I'm not saying it's necessarily a Terraform piece. I hope it's not. But I think there's different layers that we should think critically about.
Very cool. Well, again, thank you so much for the time. And where can people find you online?
I'm easy to find. GitHub, Twitter, wherever. I don't hide my email or anything. It's just Mitchell H pretty much everywhere, mitchellh.com, and if you have any questions or anything, I try to answer the emails I get.
Awesome. He answered mine, so drop him a line. Mitchell, I appreciate the time, again. Thanks so much and thanks everybody for tuning in.
Thank you.