Cloud Migration Strategies with 66 Degrees

Episode Description

Navigating cloud migrations and building modern platforms is challenging in the best of circumstances. Alex Voorhees, VP of Cloud Engineering at 66 Degrees, shares valuable lessons from helping organizations as they take on these challenges. 

Don’t miss his insights on:

  • How to tackle the human and organizational challenges that come with cloud transformation
  • Practical strategies for upskilling teams transitioning from traditional ops to cloud operations
  • Key considerations when implementing platform engineering solutions across different organizational maturity levels
  • Integrating AI capabilities into cloud architecture
  • Common pitfalls to avoid when moving legacy applications to the cloud
  • Approaches for balancing innovation with practical business needs during cloud migration

Whether you're leading a cloud migration, building a platform team, or interested in the future of cloud operations, this episode offers concrete takeaways for navigating the technical and organizational challenges of modern infrastructure.

Episode Transcript

Welcome to the platform engineering podcast. I'm your host Cory O'Daniel. Today I've got with me Alex Voorhees, Vice President of Cloud Engineering at 66 Degrees. 66 Degrees is a professional services firm specializing in helping orgs migrate to the cloud and build modern platforms that empower their teams. Alex has a wealth of experience in managing cloud migrations and tackling the human organizational challenges that you see often with these transformations. 

Alex, welcome to the show. Excited to have you here today.

Thank you. Looking forward to talking.

To kick things off, could you just tell us a little bit about your background? Where you started and how you worked your way up to VP of Cloud Engineering?

Yeah, definitely. So I joined a startup as a software engineer (I'm still at the same company, we have a different name). It was a five-person startup at the time when I joined and I was doing a lot of custom software development. As we've grown and expanded into work with more enterprises, I've been lucky enough to kind of grow up with the org. I've spent time on our solutions team and our sales team, and then leading the professional services team, which is what I'm doing right now.

Very cool, very cool. You guys work with like a ton of different types of organizations and you specifically focus on cloud migrations and app modernizations, right?

Yeah, so we work across enterprises and medium-sized corporations focused on three main big pillars of work. The first is cloud engineering, which is what I oversee. That's infrastructure DevOps and ultimately that manifests itself in lot of migration work. We also have a data practice that focuses on pipelines and analytics platforms. And then our AI practice, which has been exploding, as one can imagine, that focuses on model building and MLOps and all that good stuff. 

One of my things that I'm very passionate about, well excited might be the right word… there's a lot of operational knowledge that we're not losing per se in the cloud, but every year we're making more and more engineers and people don't really learn the cloud until they're in production. So this has been a kind of passion project of mine. One of the reasons I started this podcast was just to kind of talk more about operations and more about engineering.

You guys are kind of facing that problem, but at scale, right? You're working with a lot of these organizations that may or may not have cloud experience, but they're trying to move there. I'd really love to know what types of challenges you see with these orgs that may have traditional SysOps admins working in these data centers and then are getting presented with the cloud for the first time, where most of it is software.

The People Problem in Cloud Migration

You know, I think… I go back and forth on this, but it's really the people problem that we find is the hardest problem to solve when we're working with organizations. What I mean by that is we're introducing kind of new skill sets for engineers to learn and new business problems for organizations. 

Some folks leaving a data center aren't used to deploying, for example, an active-active Kubernetes cluster where they have to be live in two regions. That's a huge organizational lift and kind of mind shift for that team. 

The joke that we often make and we've run into and I've had a lot of CIOs tell me this is “We've done a great job building a data center in the cloud.” What they mean by that is they've kind of regressed to the mean a little bit, instead of using that journey to the cloud as a possibility to transform. They just take status quo and move it to a different data center. And that is not what we're trying to do. 

They've kind of regressed to the mean instead of using that journey to the cloud as a possibility to transform.

What we have found to be really successful with these transformations, and especially with large enterprises that might not move as fast as a nimble startup, is establishing relationships in multiple levels of an organization. You need executive buy-in at some point for like nuts and bolts budget and top down mandates. But being able to work with the engineers kind of hand in hand and build trust with them - saying, “Hey, here's how it's going to be a little different and here's why that's okay.” is really, really important. 

That's one of like the first things we try to do when we start working with organizations - get a lay of the land of how it works there.

It's interesting because there's an upskilling that obviously has to happen to get access to your work product at the end of the day. It can be pretty invasive. Everything is changing for the ops folk that are moving to the cloud. Even for the engineers it's probably changing a bit too, because I assume a lot of times they start to take on maybe some of that ops work themselves because they can manage it with software. 

Team Dynamics and Infrastructure as Code

What is the story like for those two teams when they start working with a company like 66 Degrees and kind of acquiring those skills while working with your org to do these migrations?

I joke with one of our chief architects - who writes the Terraform? When we walk into an organization, we walk into a room and there's the app team and this newly formed cloud team. And the app team goes, “Well, you're the cloud team. You should be worried about deploying it. You write the Terraform.” And the cloud team goes, “No, you have to know how to maintain it. You should write the Terraform.” And it's just like this classic debate that happens in almost every organization I've been in.

The way I think about it and what I try to coach folks is learning Terraform… and I'm just going to use Infrastructure as Code as an example, because it's kind of the most tangible... you have to do it. It's a great upskilling. There are some weird logical leaps that you have to make, but if you're not making this change and being part of the wave, you're going to be left behind from a career perspective. 

I think it's even more interesting now, thinking about how generative AI and some of the coding assist tools come into this. We're finding more and more that we can use them to kind of cut down on the learning curve for folks when they have to start writing code for the first time. 

We like it because my mantra is the best way to deliver business value is to be as close to the code as possible. We're not a strategy consultant. We are going to get in and be hands on with you and that involves getting your hands dirty. That's kind of our MO, if that makes sense.

Yeah. I don't know if you saw me like “Ahh!” 

Yeah.

Just straight up vibing there. I was just at Kubecon last week and that little joke that you have is literally how we start our sales pitch. We're like, “Who does this in your org?”  And it's funny because people are like, “Eh, everybody.” or like there's very different answers depending on the maturity of the org - where they are in the DevOps maturity model. But then there's like, “I do, but I don't like it.” 

It is a running joke, but it is a substantial decision/problem.

Oh, it is. I've even seen org say, “Well, that application is really important. We're going to give that team white gloves treatment. Everyone else is going to have to learn how to deploy it on their own. We're going to deploy it for this team.” You're not going to get the most value out of that move. 

That was my point of like, you've kind of rebuilt the data center in the cloud again. If you're sticking to those traditional models, that's how it's going to feel - you're recreating it and this whole value transformation just didn't happen.

Cloud Migration Challenges and Best Practices

Yeah, I feel like that's one of the things that's just truly hard about IAC too, right? When we look at any given cloud API, there are some attributes that are very ops-centric and then there's a bag of them that are very developer-centric. But this module that we're using is both. We both need to type the code in to get the thing to work, or we're shoulder tapping, right? 

Yeah.

I see this all the time, day in, day out. I have friends that start companies where they're trying to figure out “Do I bring on somebody to do the Terraform or is that my team's responsibility?” 

What are the different personas or profiles you see? What leads them down “The ops team does it.” versus the app team versus it's some collaboration layer?

What we see the most of is it going to the app team and it’s a little reluctant.

Yeah.

They're the ones that are responsible for maintaining. The cloud team or the ops team kind of serve almost like a COE - a center of excellence. They're kind of like part of the architecture review board (or whatever type of review board you have), and they're coming in and kind of advising. And so we really see the app team taking control of the situation, which I think is ultimately pretty good for the most part.

It's one of those things that's funny, some of these companies where the ops team does all the Terraforming it just kind of becomes like … not Terraform specifically, like any IaC… it kind of becomes like your new service now. It's like I want a database and instead of me just doing it myself, I file a ticket and this guy writes some Terraform. 

We do see folks try to implement self-service for their developers, whether it's a tool like Backstage or Spacelift. We haven't seen those be really super successful at some of our more traditional enterprises. I know they're really successful product companies but we've seen those kind of struggle to get adoption. I haven't really figured out why yet, but it's been interesting. The self-service platforms that we've seen have kind of failed a little bit, which has been interesting.

The self-service platforms that we've seen have kind of failed

It is an interesting one. I've got buddies on both of those worlds, I’ve got a bunch of friends that work over at Spacelift.  

Self-service, I mean, it is the ultimate goal, right? But how much self-service do you give? Every inch you give, there's some maintenance or some sort of burden there for somebody to kind of take care of. 

The orgs doing these migrations and looking towards something like we actually want to completely unblock our Devs and lean into self-service, what is driving somebody that direction versus maybe not looking towards something like a self-service platform?

I think the ones that are saying, “Hey, app teams, you're in control of it.” are coming to it with a level of ownership. And so in that organization, from like the top down, they're making the cloud very clear that it's part of their everyday. 

We even see this with logging and monitoring. Where people haven't wanted to change the way they log and monitor, they're still going to use some of these old legacy tools. And when you shift it to the app teams and put it in their control - I think from the management team, they're saying the cloud is not just a place for you to deploy a virtual machine, it's going to be part of something that you're doing. 

What it really lends to is them then learning the cloud native aspects of all of these public hyperscalers, where you can really take advantage and get value out of them. Versus just like migrating over to a virtual machine. If they're running Terraform and reading the documentation of how this cloud works, they're going to get more ideas on how to better their application as opposed to just kind of like pawning it off to a service person to go deploy it. 

They're going to be in the platform. Actually having to understand how it works. Thinking about better ways to re-architect. Thinking about better ways to build their application. Versus just kind of pawning it off. 

Those are the organizations that are really getting value out of the cloud.

It's funny because the go-to is just the easy way. I was in professional service for a while… it's easy to just toss it into a VM, right? 

I think one of the things that I've seen personally… I'd love to know how you guys approach this…  we're all hobbyists at heart, right? This is what we like to do, we like to write software - we do it at work, we do it at home, we do it in our home lab. One of the things I think that is hard when you first get into the cloud is like it's exciting. 

If you've been in a data center and you're an app team, your interface to those machines for a lot of times is like a Jenkins build - I push some code, some magic happens in Jenkins, some stuff ends up on some servers, great.  But then when I'm getting to the cloud the first time I'm like,  “Ooh, I’ve got a little Lambda here I can grab, I’ve got all the databases I could ever want, all the different types of databases I could ever want.” 

How do you help teams figure out what is the right services to start to adopt? Because it's all shiny and new, it's all that shiny bobble. How do you help them craft, this is the thing that's going to make the most sense for your application in business versus this is what's fun and exciting and going to get people stoked about moving to the cloud?

I think, you know, it's a little corny, but….

I love corny. That's the best way to start. [laughing]

We really like to think about what the business outcome is before the how. Every engineer loves to jump right to the solution. I've yet to meet someone who likes to sit there and talk about why we're doing this for an hour. 

It's really important for us to say, “Okay. What do we need this application to do? How much revenue is it serving or how many people is it serving (whatever metric they want)?” And making sure everything can tie back to that outcome is really what it comes down to. 

Yeah.

Having some golden state architectures where we say, “Alright, when we need this application to serve this many people or have this type of connectivity, here's the playbook for how to do that.” Crafting that with the organization, obviously, not just by ourselves. 

If we don't do that and we just spur all this excitement, what will happen is that something's going to break in a year and then we're going to be blamed for it, which is fair. 

If we just spur all this excitement, what will happen is that something's going to break in a year

Starting with the outcome of what we needed to do - we spend a lot of time thinking about that with our customers. 

For anybody who hasn't worked as a consultant  - everything's definitely blameless until you're a consultant. They can pin it on you all day long when it's a consultant. That's part of the job, that's why the money is there. 

Yeah, sure. I would argue it's more of a shared responsibility, but yeah, I know what you mean. I've been doing this long enough, I get it.

You work with a lot of orgs, a lot of different migrations, and I know that you're very into the people side. If that's the problem that you love to focus on the most, I’m happy to talk about that, but like what are the types of projects… All these orgs are so different. What's the kind of project that you like to see… like starting to do one of these cloud migrations that actually gets you the most excited? Is it like, “Oh this old COBOL system that we're going to revitalize in the cloud…”

My favorite type of conversation to have with customers is a little bit like what you just described. Not a COBOL system… let's slow down a little bit, the joke I like to say is sometimes when we're talking about old applications, it's kind of like if you lift a log up in the forest and then a bunch of like bugs crawl out from underneath the log. That's sometimes how I feel. We walk into a meeting and they're like, “This hasn't been updated in 15 years. And the person's retired that wrote it, so we don't know what's going on.” 

Ohh.

Those are really fun for me just because it seems like an insurmountable challenge, but…

Wait, how often do you get one of those? 

That happens… I would say if the organization has maybe over… if they're a Fortune 500 company and they haven't yet moved to the cloud, they have those hanging out in their environment.

They have those. What's the most exciting about that? Is it like the archeology of it? 

Yeah, so I like two things about it. One is the archeology and two is figuring out why they made some design decisions that made anyone touching it not possible. You know the adage about programming is like playing chess against yourself? That is exactly what's going on there. That's one kind of archetype that I really like. 

The other one is a newer one that I'm getting really excited about. We were with a client two weeks ago (and I think this is a lot of where app modernization is going) building some proof of concepts for them to use some AI tools to perform Java upgrades for them. It was taking in their documentation standards, taking in their code base, and then creating pull requests on GitHub where the engineer would go and review them. Those just get me really jacked up because I hated upgrading code. Now it's kind of like all I have to do is be a reviewer. 

So those are kind of like the two, I would say, classifications that make me really, really excited.

The archaeological digs are so much fun!

 So much fun.

 I don't know if I still have it on my Twitter, but that used to be my title - cloud archaeologist instead of architect. 

There you go.

Most of my job is not like sitting back and thinking about like how nice something's going to be and drawing some clean lines. I'm digging a hole that somebody else had already started digging. I'm cleaning a rock. [Alex laughs]There's some snakes in here. 

Yes. It gets gnarly really, really quickly. Those are most fun when they're kind of maybe potentially a little bit low pressure applications. 

Yes.

It gets a little more serious when they're like, “It's 25% of our revenue.” And we're like, “Wait, what's going on here?”

We're like, “It's a life support system!” and you're like, “Okay…?”

What were you doing? What prevented it to get here? 

Is AI Driving Migration to the Cloud?

What is the big driver? Obviously there are plenty of workloads that make sense on-prem… we’re seeing like a bit of repatriation, the DHA chart was like everybody's going back to on-prem now. It’s like, “Are they?” I'm still seeing a lot of people go into the cloud.

It’s not clear.

Yeah, it's like a little here, a little there. It's just kind of shifting around columns.

What is the big driver of the organizations you're working with? Is AI a lot of the driver of like moving to the cloud? Or are they still looking at just like, “We need the lower cost. We need to make teams more agile.” Do we have those more traditional motives?

The last year has been really interesting because if you'd asked me that question a year and a half ago, I would say it is definitely still cost savings. This is not the view of my organization, but cost savings is a highly debated topic of is it cheaper to run in the cloud versus on a data center? I think, depending on your financial engineer, you could make a case for both, right? Even though I feel pretty strongly the cloud is cheaper. 

We see people exiting data centers or leaving a contractual obligation that they're not really fond of. So that's like the VMware folks who are like, “I'm not dealing with Broadcom.” or, “We need get out of this enterprise agreement with Microsoft, for the last 20 years… we're kind of done with that.” 

There's still kind of that traditional or typical cloud migration. But what we're seeing a lot of now is folks saying, “Hey, is our architecture of this app that we migrated two years ago, ready to be consumed by AI or with AI? We have to harden this or start exposing more microservices so that we're thinking about it to be ready when we need to get there for some AI work.” 

That has been really, really interesting. It introduces a new set of problems for cloud engineering that have been exciting for me. One of the things I like about cloud engineering is you're kind of always in the middle of that. 

It's kind of like both. We have a lot of… especially the past year… we've gone from POC to production in the AI world where we're really focused on production now. When we start getting things into production, it introduces a whole new set of infrastructure problems and cloud problems and DevOps problems that we're tackling right now. 

How do we create a service oriented architecture for AI agents to talk to each other? Not many people are talking about that, and this is what's on the top of mind of a lot of CTOs and CIOs that we work with. They're like, “Okay, great. We're in BigQuery (or we're in Aurora) but I have this product team bugging me about getting this agent deployed. How are we going to do that safely and securely and make it scale and not have it blow my budget out of the water?” 

Those are all really new, interesting tasks for us. 

Yeah, I haven't really started working with agentic stuff yet. Do those tend to be request based or are they kind of like actors that acting on their own? Are they handling HTTP requests and just like doing something in a long-running job? Or does it tend to be more like background workers, where they're interacting with different systems?

We have been thinking a lot about it as the back room actor. A really good example of this is like a procurement agent, if you will. There's a procurement agent that's… and this is where I think some of this is going, I think this is really analogous to like the journey to the cloud. This logical and change management leap that I was talking about, how people are resistant to change. Like this is the same thing, we're kind of entering that paradigm in my opinion.

Okay, so we have this procurement agent, they're kind of acting like a business analyst. They're responsible for procuring a VM from whichever hyperscaler is the best based on these criteria. That AI agent has to somehow talk to the ERP system through some sort of microservice backend connection and then deploy it into the cloud or execute the purchase agreement. I would call that a backend - kind of happening behind the scenes. 

What's really cool is it introduces - how do you monitor and evaluate that agent? What's the rubric that they're graded on if you treat them like an employee who was doing a similar task way back when. Those are really early on conversations. 

The cloud team, when we get involved, it's like, Is our architecture hardened for this? What's the API interaction between these agents and these other services? How are we logging and monitoring? What's the security around them?

Those conversations are just starting and it's an exciting time to be in that space. So we are definitely having those with customers. 

That's interesting. This is just a space I haven't really gotten in too much, but it seems like there's almost like a new class of Error. There's errors in prod, like prod goes down you get a 500, files not found, whatever. Then there's our bugs. I didn't get the logic quite right, I introduced a bug.  But then there are hallucinations… 

Right.

Where the system just starts behaving oddly, right? Especially when maybe this one's procuring some stuff and it's like, “Hey, you know what? I could use a little more RAM.” [laughs]

Yeah. And so how do you evaluate that? We're starting to have those conversations. Obviously, I think, the folks that are doing that level of autonomy are on the bleeding, bleeding edge.

If we talk back to like the class of projects we're working on, if you're not thinking right now when you're deploying architectures and just the cloud about  how would this be consumed by something that is not a human or a backend service that's kind of a typical HTTP request… you have to start kind of reorienting the way you're thinking about it. Which has been fun.

I started dabbling very recently in a proof of concept. It's interesting because the one I'm working on is to help people through bugs in their Infrastructure as Code. Like, I found something not quite right. It's interesting because there's a lot of context there that me, as a developer, if I was working on it, I just know. I've done this for a while. A model might be able to generate some stuff, but there's a lot of the system that this Terraform (or whatever) is going into that it doesn’t. 

The models trained on God knows what, but then there's my metrics, my actual architecture of the cloud, other systems, what is the request volume?... It starts to get very interesting. There's a lot of information that I can give a model, but how much information do I need to give this model, right? 

That's one of the things I'm trying to figure out now, with some of my customers. Like trying to figure out how much context do I need to give this thing that it doesn't currently have access to, to get a good answer without giving it too much.  Because it might be lobbing over like five megs of information, right?

Yes, that's really interesting. I haven't thought about that before, that's really cool. I haven't done that yet.

It's weird. I don't know if anybody outside of like OpenAI and Anthropic know what they're doing. 

Yeah.

I'm definitely like caveman style - I'm just kind of beating a wrench on a model. 

No, that's cool.

AI Implementation in Professional Services

With the clients wanting to move to the cloud because they're getting into AI the first time…  you guys are working with AI with a lot of organizations. There's obviously your customer's goals, but how is 66 Degrees looking at AI for how they help you serve your customers better? 

We trialed code completion tools or coding agents and our engineers that are hands-on-keyboard are using that. 

I have to review statements of work in our organization (it's one of the things I do) and I have an agent or a model or whatever, pre-read the statement of work before I get to it and kind of like check out items that I always call out. So like, are we going to prod or dev? If that's not in the statement of work, we need that. I have list of questions that I ask of it before I even read it, which is great. 

We've seen our sales team be really successful in background research about potential opportunities for customers and learning about the organization.

Oh.

Kind of asking of the thing, “They just did this project, what's the next best project for us?” And we obviously feed in all of our data from our CRM system that says, “Hey, they did a migration, they did a database proof of concept.” Chances are, here's the next best project for them to work on and for you to go pitch them. So you can feed in a lot of your enterprise data and get that out of it.

Then the last, which is a fun one for us, because I think… it's funny. were just talking about this this week, like I don't think any engineers… I've yet also to meet one who like loves writing documentation. I don't know if that's you? But like I'm not, no one's a big documentation person. 

I love the concept of writing documentation. [laughing]

Yeah. We have potential metrics. 

As an engineer, if you write one case study a year, it's going to show up on your performance review - great. And what we have seen is being able to take technical design documents like TDDs and convert those into case studies just automatically.

Oh.

 And then you tweak them. You can say, “Hey, what's the situation? What's the complication? and here is the outcome.” Enough of that is in the TDD that when our engineers can go and do that (and then they obviously edit it and tweak it as need be), but just being able to crank out a 75% solution quickly is like actually game changing for us and for a lot of folks. 

Just being able to crank out a 75% solution quickly is game changing for us and for a lot of folks. 

Case studies are hard.

They’re really hard. 

Even just getting started on one can feel like a task. Having it like 70% of the way there, were you’re just in like edit mode, that actually seems pretty valuable. At least from my time, one of the most valuable things you can do as a consultant is spending that time to make materials that let people know how good you do.  Trust is the number one step I think in selling some sort of operations package to somebody.

100%. Those tasks really help our customers. The other joke that I like to say… and when people talk about AI is going to take my job away in the IT space… it's like, I've never met someone in IT and a CIO who is like, “Hey, we have this week free from work. Like there's no backlog.” Everyone is time strapped.

Yes.

I've never met anyone. Like open up someone's Jira and look at their backlog and I guarantee you they're like a sprint or two behind. 

No one is operating like that and if we can help them do something that is 10% faster, it's going to make their lives… Like that's the big thing with the change, we're not replacing you… obviously, maybe some things will be replaced in the future…  but if we can get you 10% of your time back today, let's do that and make it a little bit easier on you. 

If we can get you 10% of your time back today, let's do that and make it a little bit easier on you. 

We are seeing that with both AI and even moving to the cloud - like an app mod project, there are ways we can save that time for people.

Yeah. Sorry to stay on AI so much, but I know that you guys are doing a ton with it so I'm just super interested.

Yeah.

I had a really great interview a few weeks back with someone from Google and they were talking about how they're starting to incorporate and starting to think about how AI exists as a part of the platform. Like your platform, your business's IdP or whatnot. Are you seeing companies looking to incorporate AI and like their models into the platform they're giving to their devs? Or is that like far off from where they are? They're just trying to get their apps to be able to use it. 

I think the percentage of companies that are doing that is slim. 

Okay.

But I don't see that being too far away - having an agent. Like I'm really hot on this agent stuff right now. I don't see it too far away where you have an agent on your DevOps team helping you complete your job as a co-worker. I think that is definitely coming. 

Right now we're still seeing people doing things like chatbots, enterprise search, model building - successfully and moving them into production. But no, I think we're early adopters of folks that are doing that.

Yeah. I feel like at some point in time you're going to start seeing your agentic bots showing up in Backstage.  So it’ll be like your microservice, your microservice, and then you're like, “Hey, Todd right here, he's a robot and you can give him this work. And here's who’s responsible for this robot.”

It's a little freaky. I saw Lattice and came out and they said they're releasing a scorecard for AI employee performance reviews. And at that point I was like, “Oh geez, like this is really going to change.” 

But what we see… I think in 5 to 10 years from now, engineers are going to be orchestrators who are responsible for working with these agents and orchestrating them. I don't know how that's going to play out, but that's how I see it. 

I don't think engineering is going to go away because I think you still have to have a pretty…  like you have to understand a little bit under the hood. I think everyone takes one computer science class about chip design, I think it was called organization architecture for us. And like you learn about binary and how to work on PDPH.

Yeah.

You do that once, right? You're going to take one programming language class where you learn how to program language and like a data structures class... sure. But I think it's not really going to get into this orchestration movement. 

They’ll be living in the platform, just like you said. They'll be making pull requests and approving pull requests. Like that's how that's it's going to work. Which I think is pretty.. it's a little scary, but it should be interesting

Yeah. I have a thing that I go to, because I feel like I've been asked just so many times by people like, “How do you think it's going to affect the job market?” And I've got like… I don't know if it's a bleak answer… I don't know if it's bleak or positive, it's somewhere in between.  But it's like a bit of a conundrum. I think that if you genuinely are afraid of AI taking your job, it will. And if you're not, it won't. 

I think that that distinction of like I understand the value that I can provide to the system whether it's me typing to a compiler or me typing to bot - the human element is still pretty important. I think that we'll get to a point where it gets much better at organizing the context and understanding the business than we do but like the amount of politics and culture that go into getting shit done in ops is significant. And I don't see that unless you’re like training it on people, like on your actual interactions with your team. 

For a job that is so technical, where we're as ops people so far from the product, we have a lot of people problems in our space.

[Laughing] So yeah, two things on that. 

I'm cribbing this from someone but like… I don't know if there's been a technology movement that hasn't been inflationary. Meaning it's grown the pie, even though at the time it feels like it's going to take it away. And I have faith that that probably will continue. 

The second is what you just said about that classification of problems.  Why I like cloud engineering and platform engineering so much is because you get this weird… not weird, but the tier of problem with a technology problem (which my joke is almost every technology problem is solvable that we encounter through either hard work or just really hard thinking.

For a job that is so technical... we have a lot of people problems in our space.

Like we can always figure it out. It's not the biggest mountain to climb. It's a mountain sometimes, but it's solvable) colliding really hard with like a business decision that's being made (like this is going to have to start taking 70,000 requests a second) and then the people problem (where the person down the hall from you is like, “Well, that's not how we do things. I'm not doing it like that). 

That confluence and collision is, I just think, fascinating. I don't know where else it happens in organizations. And it's why I like talking to CIOs and CTOs, because when you hear about all of those insights and politics and tech problems - it's just really fascinating to me.

Cloud Migration Maturity Levels

Do you guys try to target a specific maturity in DevOps of the companies you're helping or do you get companies that are kind of all over the gamut? Like “We are cloud ready, we're ready to go.” and other people that are just like “We are in 1998 still.”

We get all over the place. What we find though is it's kind of like a bi-model distribution though. 

The folks that are, “Hey, we're in the 1980s.” to use your language… and to my clients, if they're listening to this, you're not stuck in the 1980s… but they're typically the ones doing the cloud migration. So they're taking the first step. 

The folks that are kind of in the middle… where they've got a DevOps practice, they know what they're doing… we're kind of doing app mod.

Then the ones that are really far along, our cloud team might not be helping with them. We might be doing a lot of augmentation work with them, where we're just adding fuel to the fire. Or they're really working with some of our other practices doing more cloud native stuff. 

To this day, we're still seeing a lot of folks migrate. Like you said, people are still migrating. We're still running into, “What is CICD? How does CICD work?” And their expectation, which is I think fair, is that their time to value should be super small compared to how it was five years ago because all of these problems have been solved. Which is true, but sometimes what we run into is all the other problems that slow us down. 

Platform Engineering Approaches

Yeah. When you're seeing developers and DevOps teams interact with one of these new platforms to them, whether it's the Google Cloud Platform or something like Backstage, a lot of times they need to see how it fits into their world, how it fits into their mental model. There's so much new technology they're getting presented with. And then there is the interface that they're going to have, whether that's their GitHub actions and that's their interface to the world, or whether they're hopping into GCP and they're like, “You know what, we're just going to do click ops because this whole thing's new to us.” 

How is your team thinking about the platforms and the processes that you're going to give these teams? Are you approaching it in like a “Hey, this is the best way to run in the cloud” solution, or are you still finding that these teams are very different? I mean, they obviously are, but are you finding that they actually need different solutions? Like this team definitely is ready to move towards something like platform engineering and getting an IdP versus this team definitely just needs an M0 and something to automate their infrastructure and they're not quite ready for the full platform engineering gamut. 

How do you work through that with those customers? And what do you see in there?

I have a Slack alert for if anyone says click ops, I get pinged and I immediately say, “No, what are we doing? Why isn't my team involved? What do you mean we're doing click ops?” 

That's a joke, but what it comes down to is really the classification of project. If we're doing a data center migration for someone, we are going to push them to a platform engineering solution. We're going to ask them to make that leap during the migration because it's the easiest time to do it. We're instituting enough change as it is, this change for the better is fine. 

There are still some folks, smaller organizations, that are fine with just simple Terraform or  even some click ops, if they're doing kind of like proof of value and proof of concept type work. But our kind of bias is always to be getting that on the most robust DevOps platform possible, because we think it's going to give them more value out of being in the cloud and reduce headaches for them. Even if there's a little bit of a learning curve to get going.

It's the way - we always try to lead with that.

Yeah. Do you ever feel that introducing them to a platform could actually ease that learning curve? 

I feel like a lot of times that's one of the things that's kind of fun and scary about the cloud - there's so much stuff to learn. But if you're given like, “Hey, this is how I interact with the cloud. I've got some modules and Backstage that I use to get my different resources.”... like you've kind of lowered that surface area for them and made it a bit easier to understand.

That's one of the things that's kind of fun and scary about the cloud - there's so much stuff to learn.

I was actually really curious… is that like “Hey, we're going to get you on cloud and then we're going to bring this other big change later?”

100%. We see that often where like, “Hey, we're going to deploy new stuff to the cloud for the first six months and maintain two environments.” And they see how easy their life is for those six months when they're deploying new things. And then they're like, “Wait a second, I have to go back to the data center and figure out how to migrate all this stuff. I don't want to do that. I want to just build it new in the cloud.” 

So I one hundred percent agree introducing those tools and that thinking… and you can do that in formal classrooms, or hackathons is something that we really like doing, or just through osmosis of working with consultants hand-in-hand (there's the pitch obviously)... can make it easy. So I one hundred percent agree. That's a huge thing we see.

Common Migration Blind Spots

For the people that are kind of leading these migrations on the other side, what do you see as some of the common blind spots that they just don't… that you guys catch too late… or what is the common blind spot people have as they're starting to do these migrations to the cloud?

I would say the biggest blind spot is usually… it always takes a little bit longer than they initially think. There's always usually bugs that don't get resolved. 

It's not a silver bullet at the end of the day. If your application, before you make this migration, has latency problems, moving into the cloud might resolve some of them, but it also might not. So thinking about it as a silver bullet, just because you don't have to worry about the power going out at your data center or it flooding or there being a tornado or whatever natural disaster… it's not a silver bullet moving to the cloud. I think that's really one of the big things. 

The joke that I made, but it's true… I had a CIO who said, “I don't want this migration just to be a dashboard with the number of virtual machines that we have left to migrate.” Treating it as a like-for-like… and this is true of any change in an organization, whether you're rolling out a new platform… if you're treating it as a like-for-like, as opposed to a plus one on the other side, you're going to feel hurt in six to eight months. 

So those are kind of the two big things that we see.

That's really interesting. I feel like there are just so many potential blind spots but it's also one these places where it can be hard to figure out what blind spots you even have because it's just so fundamentally different than where you're running today. 

Right.

I've seen this with customers where they're like, “We didn't even think we had to worry about that because we're moving to the cloud. They have everything.” And then they get there and they're like, “What do you mean there's not a managed service for this esoteric database?” And it's just like, “Well, there's not.” And they're like, “It's running on Solaris.” And we're like, “Ugh.”

Yeah.

You just created an interesting problem for me to solve. Right?

It can be very, very hard to find these sometimes because the thought model and the things are so different once they get there.

Yeah. No one has been thinking… these things are usually running by themselves at this point, like no one's really thinking about what platform has to change.  Even the idea of running applications in multiple regions inside the cloud is something that can be incredibly confusing and difficult for organizations to get their head around.  

We always tell our teams, “You have to tell them the best practices and really make sure they understand why we're doing it that way.” That's part of the part of the gig. We are here to be that change agent and institute those best practices, even if it's a little bit kind of difficult to get through.

Even the idea of running applications in multiple regions inside the cloud is something that can be incredibly confusing and difficult for organizations to get their head around. 

Multi-Region Deployment Considerations

I’ve experienced this in my professional services days, I'm curious if you see it and how how you handle it, if so. 

You get this customer every once in a while… it might not be one of the upper stakeholders, but it's usually an engineer… who's like, “We actually do need to run active-active in two regions”, and you're like “You don't.”

[Cory laughs] Like you don't.  

Yeah [Alex laughs].

It's like, “What do we do if AWS goes down?” It's like “Well, I mean, most of the Internet's going to be broken, right?” 

I mean, there are definitely cases where it makes sense to run multi-region…

Sure.

But the teams that are like just a bit too overly excited about it  - like,  “This is what we have to do because this is true disaster recovery.” Do you see a lot of people that are reaching for that prematurely without understanding the effects of…

Oh, yeah. All the time. Literally this week we were having that conversation. Usually it's a DR issue. 

We try to empathize with why they're saying that.  Is there a business reason… like if we tie it back to the why. Like, “Hey, is your uptime needed to be that high? What happens if this goes down? How much revenue are you losing?”

Those are sometimes difficult conversations. Most of the time for us, the reason we might be advising them a different way is we think it's right for them. You know, we have our own constraints that we're trying to hit, and introducing that complexity is going to make it more difficult for everyone to get done. 

We've run into people that have said, “I want my storage system replicated across three different clouds so that I can switch over cloud storage to S3 to Blob Storage on Azure.” And we're like, “Why? What are you worried about?” Are you a Doomsday Prepper? If two clouds go down, we're all screwed. If you think about it… like we're at war, like something's bad.

Yeah.

Just having those conversations is the way we try to do that. And then, kind of like I said before, if you've earned trust early on in relationships when you're enacting these changes those conversations get a little bit easier down the road.

If you've earned trust early on in relationships... those conversations get a little bit easier down the road.

Sometimes people are just trying to… not show off, but make it known that they know what they're doing and that's a way to kind of earn their trust and advise them, steer them the other direction.

Yeah, I've had that do this as well and it is awkward because they're like, “But this is how we stay up forever.” 

Right.

And it's like, every single one those nines that we add on to the end is more expensive than the last.

Correct.

You know, I've met companies where they don't care about the egress costs or the ingress costs but when you get to the engineering cost of like now your engineers… like that might be a pretty serious application rewrite to have a database that can be in two regions, right? 

Oh, yeah.

Unless you're doing like reads here and writes there. But like you’ve still got that single-ish point of failure, right?

If you have a CIO or CTO that isn't technical, like me, they might miss that engineering cost or the change that has to happen. I 100% agree, we see that all the time. 

Trash Ops Lightning Round

I know we're coming up on time, if you have a few more minutes, there is a new thing we started doing. We just started about a week ago. And it is optional because I'm springing it on you.

We started doing this little thing that I call Trash Ops. It is an extremely long list of questions -  that is growing with everybody I tell about this because I think of funny questions. I'm just going to grab like three random questions and they're just to expose like the dirtiest secrets we have in DevOps. So I want to hit you with just three of them and see where we go.

Sounds good. I'll try my best.

This is going to be interesting.  First one. Do you name your servers… this will be in the Pets area for everybody that's thinking Cattle… Do you name your servers after characters from TV shows or movies or…?

I don't. I think it's kind of weird to be referring to machines like that. I guess I used to, and now I don't. When I was first starting out in my career, I had a Slackbot that was named after the robot from the movie Interstellar. I have subsequently graduated from that phase of my development, so I don't. I don't think many of our customers do either. I kind of just take the boring business name dash environment name. That's a question though. 

Nice. Okay, I love it. 

Have you ever rationed a hackathon allowance to pay for breakfast, lunch, and dinner?

I have not. 

What?

Yeah, I have not. What do you mean by that? Like, rationed as in pulled back?

Yeah, like let's say you're getting 50 bucks a day for your hackathon. Are you like, “Okay. I'm ordering breakfast, lunch, and dinner right now.” or are you like…

Oh yeah. Yeah, yeah, yeah. 100%.

Oh hell yeah. What's your go-to on a hackathon lunch?

I will do… I mean I think it's a little counterintuitive, but I always do pizza for hackathons. It kind of puts you to sleep, but it also kind of brings back the college in me where it's like you're just eating a bunch of crap food. I think it kind of helps fuel when you're hacking away like that.

It fuels the hacker mind. 

Yeah, you have to.

We did a hackathon last year and one of my teammates… he's the one that came up with this. He's like, “I was splitting every day. I was ordering breakfast, lunch, and dinner with that.” I was like, “I was ordering lunch and then a bottle of wine for dinner.”

The good thing with pizza is it's kind of all three meals…

It is all three meals.

In a sad way, but yeah.

All right, last question. Have you ever taken down production twice in one day?

Oh yeah. Two plus I would say. 

Two plus? Oh my gosh.

I've taken down production more than once in a day for sure.

Do you have a story that you remember?

It was legitimately not having good enough test coverage and I was deploying fixes to fixes, and it just kept on like breaking it. It was luckily an internal application so it wasn't the end of the world, but it was also at the end of quarter… that was where it was really needed.

I had to go back and… it was internal facing, but it was still customer facing… and basically what was happening is an order management tool…

Oh no.

Orders could still get taken, but they weren't being provisioned. Like they weren't getting invoices generated. All of the backend process wasn't happening. So I had to go back through the logs, like one by one, and extract all the orders as I was logging them to our cloud logging system to be able to place them by hand. 

So that was pretty painful. But that was probably the worst one I've had.

Oh my gosh, we could have got to that story the other… I shuffled it like right before you said you were down and the fourth question was “What is the most amount of times you've had to revert a PR before you're able to get a fix in?” 

I think I've done it three times before. 

God, that was painful. That's bringing up some bad memories, that's like going to a place in my brain that I don't know if I want to go back to anymore. [laughs]

Okay, we'll refresh it - If you could remove one database off the face of the planet, which one would it be?

SQL Server probably.

See, now we end on a positive note. We don't have bad memories come in. Everybody that's listening is like, “Yeah SQL server is gone.” 

Exactly.

Awesome. Alex, thank you so much for coming on the show today. I really appreciate the time.

Where can people find you on LinkedIn, Twitter, around the webs?

Yeah, I'm not too active on Twitter. I'm like a lurker of other people  - as most folks are on Twitter. But you can find me on LinkedIn or our website. I'm more than happy to talk about DevOps and all the stuff you're working on. 

Awesome. Well, thanks so much for coming on the show today. 

Everybody, thank you for tuning in!

Important Links:

Featured Guest

Alex Voorhees

Vice President of Cloud Engineering at 66 Degrees