Executing Well in Healthcare with Jessica Kalinowski
Implementing DevOps and Platform Engineering in Healthcare
Jessica Kalinowski, VP of DevOps and Corporate IT at Connect RN, shares her journey from corporate IT to implementing DevOps and platform engineering in a startup environment. Jessica discusses the challenges and successes of applying tech strategies in healthcare.
The episode covers strategies for platform adoption, including early engineer engagement and flexible implementation. Jessica discusses how automation has enabled efficient management with a small team, benefiting the entire organization.
Welcome to the Platform Engineering Podcast. I'm your host, Cory O’Daniel, and today I'm thrilled to have Jessica Kalinowski, Vice President of DevOps and Corporate IT at Connect RN. Jessica has an impressive background in leading cloud migrations and cultural change in multinational organizations. Having worked in healthcare myself and seen many healthcare operations teams struggle with the challenges of hiring and securing budgets in healthcare IT, I'm always fascinated by teams that are executing this well in the healthcare space. This is the one reason that I had Jessica on today. Jessica, it's great to have you here.
To start, could you tell us a bit about your early career and what led you to focus on platform engineering and DevOps?
Yeah, absolutely. Thanks. I'm excited to be here as well.
I actually started in quality assurance. I was just a software tester, wanted a more stable job, got trained. I was like, I can do this. It's no problem.
Slowly in my early career I realized the need for things like automation and started really focusing on problem solving in organizations. I think I'm naturally just drawn to the crud and I like to fix it.
My first big challenge was, you know, migrating an entire organization from QA to quality engineering and automation. Kind of working my way up through leadership into that space and then fell in love with the concept of cultural and organizational change.
And that led me… I was in healthcare always, but the early part of my career pretty much was spent in healthcare. And the problems on the back end of that are really interesting to try and solve, in a heavily, heavily regulated environment.
I took a step away from healthcare to take over a site reliability team. It was a really good opportunity and while it was completely foreign to me, the concept of SRE was still pretty new then. I was familiar with the cloud. We had done some big cloud migrations and I decided I was up for the challenge.
That is really when my intro into the concept of platform engineering and site reliability as a whole came to play. And then I just built off that, figuring out more ways to automate and get into the concept of platform.
I think big companies are like platform engineering. Startups are kind of like, we build stuff in an automated way. I left big corporate because DevOps is really hard to implement in a large organization that has done the on-premise shift to the cloud. And so I was hitting a lot of walls in doing true DevOps and automation.
And I did it full suite here at connectRN and I'm loving it.
Very cool, very cool. Before we hop into talking about connectRN, I'd love to rewind to that bit there when you said you saw teams moving from on-prem to the cloud struggling with DevOps.
Yeah.
I see this constantly, but I would love to just go off the book. What do you think is the biggest challenge? That cultural shift where it's like, I have all this stuff on-prem, I'm moving to the cloud… what is the biggest challenge there? Because it seems very common.
What I saw initially in the healthcare space is when the cloud was… you know, 10 years ago, cloud and Amazon was the big thing and everyone's like, I'm going to go to Amazon or I'm going to go to Azure. The idea was to lift and shift these very large healthcare applications to the cloud. It was just going to work and we were going to save a ton of money. And that is not how the cloud is built.
The idea was to lift and shift these very large healthcare applications to the cloud. It was just going to work and we were going to save a ton of money - that is not how the cloud is built.
We had seen some pretty big projects where they literally would say, “Okay, it's a SQL database. We're going to lift and shift this entire SQL-dependent application into the cloud and run it just as it stands today.” And that's not really what the cloud is built for. You have to optimize your applications to work with the cloud technology. And so we had moved all these applications using the same concepts of like old school change management and ownership and all these things into the cloud. And it failed miserably.
In fact, probably the best example I have is we lifted and shifted a giant claims database. The whole premise of this application was just to adjudicate and run care measures. If you're in the healthcare space, you understand care measures. So you're looking back 10 plus years for every patient on this data all the time. And on-prem, it ran in maybe eight to 12 hours. And you're talking about terabytes of data it's processing through. And in the cloud, it took four to five days and cost them tens of thousands of dollars. And it was an instant flop.
We had messed up. It was the wrong idea. In some cases, you can't just take this on-premise beast, stick it in the cloud, and apply on-premise principles to it. It's not how the cloud works. And I think that's where I see a lot of companies fail. They want to apply on-premise rules, roles, responsibilities, and processes to cloud technology. And you can't do that. You have to change your way of thinking and operating when you move to the cloud.
You have to change your way of thinking and operating when you move to the cloud.
Yeah, that's interesting, right? So it was like, we lifted and shifted, but it was very obvious that the metal, the VM underneath it is…
Yeah.
You just have different resources and it probably warranted some app modernization, but that kind of falls on the other side of what you're saying. You have to get there successfully before the company is going to reinvest more into this culture of DevOps and being agile, right? So when you see something explode, if it's like, “Well, we just didn't architect it right.” It's like, “What if we just move it back?”
We did, we moved it back. I think the early learnings from those types of projects… ultimately they rebuilt a system, but it was a lot of learning.
The cloud is a powerful beast, but what is it made for? Like how does the cloud look in any organization? It's much easier in the startup space to apply those principles because many are starting cloud native. But 10 years ago, cloud native wasn't it. You were trying to take everything you had and apply it to the cloud.
I think there was a lot of learning along that path. And when do you say we didn't do it right and like cut, we’ve got to go back. And that's a hard pill to swallow for a lot of organizations.
Yeah. I got to work with an organization a few years back that had a similar-esque story for a lift and shift. I don't want to say they failed, they definitely got there. It took longer than intended. But if I would have been the engineering manager there, I would have considered it a failure. Because I think they ran into the other side of it. Like they, brute forced getting there and they got there. They ate some costs. But at the end of the day, once they were cloud native-ish, a lot of those operations folk (that weren't software developers, weren't cloud engineers) left.
That investment still wasn't there to upskill them and make them cloud engineers. It just kind of was like, “Oh, you got us here. Now take those Linux admin and router skills and apply it to the cloud.” And they left. And that team lost them as their entire operations team was just going back to working data centers.
Yeah, we went through that at one of the companies I worked for that was a finance company. They had kind of two SRE teams. One was focused on the on-prem, which is interesting in and of itself that you have the idea of site reliability, but for on-prem systems. And the other team was completely in the cloud. And so it was two parallels of a universe that didn't unite.
When I was asked to take over the whole operation and bring them together - like we're going to migrate, we're doing all these things, let's align on our operations - that was a large challenge. You're in two completely different skill sets.
The on-prem world is built on process, procedure, and red tape. Like check the buttons. If you don't check all the buttons, you're not doing what you want to do.
In the cloud, it's a very different culture, right? We still do the audit procedures, but almost everything is automated. Change management is automated. You're trying to move faster and shift left. In the on-prem world, that doesn't really work. And that's a large part of the cultural challenge.
Yeah. Rolling out DevOps initiatives and platform engineering initiatives at earlier stage companies is also a challenge, right? There's this challenge of like, if you've waited too long and got too big, it can be hard. But then there's getting people to figure out when's the right time to start buying into it at an earlier stage company.
I feel like this is exacerbated in healthcare just across the board.
Yeah.
When you joined Connect RN, was this motion already in process or is this what you joined to kind of help move forward?
I was brought in to do it. My original title was just Director of DevOps. They had no cloud team. I think they had the best intention of applying those principles, but had hired a lot of folks that came from big healthcare and kind of legacy practices.
When I joined, they were very much in a waterfall methodology hidden by agile terms. Which we see all the time, right? Everyone uses the term agile and DevOps. And if you're releasing every two weeks or every three weeks, then you're not really DevOps - for sure. And you're not really agile.
When I came in, it was to break down those processes. The thought process around it and to figure out, how do we move to the left. And a lot of that came from the concept of platform engineering, right?
Apply platform engineering, bring in a cloud team. And while they're their own little bucket of work, they spread out with engineering, they're there to help solve problems. They really have embedded themselves across all of the teams to make sure that what we're building works and we have buy-in. I think that played into the success.
ConnectRN was already cloud native, which was a huge help. We weren't dealing with any legacy systems.
I think from my learnings and my past experience, the first thing you do is get your executive buy-in before you move one step further - to ensure that you're going to be able to do what you need to do to get things up and running. It's an investment. And I think Startups struggle at what point to make that investment and commitment to the long-term wins that you get from applying those principles.
Getting that stakeholder buy-in, that's another one that's pretty interesting. Because I feel like it can be hard to do, especially outside the engineering org. If you're the lone CTO of a startup… especially if you hit Series A, Series B… it can be hard to get people around you to be into the idea. You have to communicate it to them effectively, get them to understand why this is important for the org.
While at the same time, I feel like just a little beyond startup, you're kind of seeing a lot of organizations are kind of the other way. There's executive buy-in first, and it's almost dictated - we're moving to platform engineering, this is the way that we're going. And teams are like, “Oh, are we? Is that what we're doing?”
You have to communicate it to them effectively, get them to understand why this is important for the org.
That problem sucks, but the problem of just getting that stakeholder buy-in… what are some techniques you've used to illustrate the value of platform engineering and DevOps without necessarily like starting to build out projects or starting on initiatives?
Yeah, it's a good question. And I think it's a little bit different in every organization. But the key is understanding where the business wants to go.
Platform is tough. It's like IT, right? We make you zero dollars at the end of the day. We are not a money-making organization. But what we offer is time to value and other things that once we get things into place, things are rocking and rolling and you're getting what you want much faster. Putting that in front of executives, who are looking at project deliveries and ROI - how do you explain that if you give me six months, I'll make you faster.
So for me, it really boils down to understanding where they're trying to get and then putting platform terms into things that they will understand. Like if you tell them you're going to use Terraform or Kubernetes or something, they don't care. They have no idea what you're talking about. They don't understand the concept. And so we keep it, I would say, pretty simple. We focus on simplicity and security and delivery. When I'm presenting what we're going to do and why we should invest our time in that, it's focused on business outcomes that we will achieve by doing this and what is the long-term of that.
It really boils down to understanding where they're trying to get and then putting platform terms into things that they will understand.
That's really how we were able to get by and across the board to move in a different direction here. And again, like I say, automate all the things - that's my job. I just automate all the things, my team does the same.
But the other key focus is we kept the team really small. Platform engineering has a really awesome benefit in that if you invest a little bit of time with some really good resources, you don't need 20 people to run a platform. You can automate it. You can train engineers. You can really utilize resources in a way that is not big healthcare, right?
A lot of big healthcare IT is empire-building and these large teams - you always need more people. And so we really focused on the concept of staying small.
Like how small can we stay. Build this really repeatable, reusable, stable infrastructure that engineers could then use to their benefit, but we didn't need to hire more people to maintain it. And that's what we did. So I have the same team I started with two years ago.
That is awesome.
What was the adoption story like for the engineers using your platform? Were they mandated to move to the platform [Jessica laughs], or were you working with champions on each of the teams? I love the laugh. I have a feeling where this is going now.
I'm laughing, yeah. There are some stories, right?
So being in cloud, you think everyone wants to do this. Like we're telling you, “We're going to build this thing, you're going to be faster. It's going to be awesome.” And they're like, “Whoa, what are you talking about? I don't want to use something that you built.” You'd be surprised.
[Cory laughs]
Well, you're not surprised. I can tell by your laughing, but the engineering adoption is kind of hard.
It is very hard.
In the beginning, we failed at that. We had the executive buy-in and we did not have the engineers' buy-in. And guess what? You will not succeed without both.
Yeah.
We went back to the drawing board and said, how do we get them to adopt these things that we're building, without them feeling like we were forcing it on them.
So, we started to engage much earlier - much earlier in our ideation, in our testing, in our scoping out what we were building so that they felt like they had input into what this end state was going to be. And that made a world of difference. Just that they felt heard, seen, understood, - they were playing around with it much earlier.
The same for our quality engineering team. When we said we were turning on continuous delivery, they were like, “Whoa, whoa, whoa, whoa, what are we doing?”
[Cory laughs]
And so we literally embedded a cloud engineer with them to help them integrate their automation, beef up their automation, and really understand what we were doing.
Now we are CD and it's a pretty cool thing to see people realize the benefit now. They were terrified at first, but now they're like, we have so much more time to focus on improving rather than the day-to-day releases going on.
It's a big shift. Getting to that stage of continuous delivery and engineering adoption in a platform and infrastructure is a lot harder than people think.
It is extremely hard. We had a similar problem where we had a portion of the company that was happy to adopt, but then there were a couple of holdout teams.
I feel like the role… it's so funny, like compared to like a product-facing engineer who's working with PMs and working with customers… there's a lot of customer interaction there, but I feel like the politics and culture of an org is really on that DevOps team. And I feel like we deal with a lot more socio-technical (particularly) hurdles to get over to get our job done.
I say my job is sales. I feel like my job is sales. Technically, I've really smart guys. I have a fantastic team and they're building fantastic things. And I would say at least 50% of our job is sales. It's selling other people on this cutting-edge, crazy thing that we want to do that changes the dynamic of how they work.
When you get it, though… when you get things rolling and you build that trust, it creates a lot more opportunity. We're working with some of our hospital onboarding teams. We're working with facilities, automating things. So my DevOps and cloud team now has built this platform concept that's extending to sales and onboarding and HR. I mean, everyone reaches out to us to say, “Hey, we don't want to do this anymore. How do we automate it?” And I think that is like the coolest part of my story.
I would say at least 50% of our job is sales. It's selling other people on this cutting-edge, crazy thing that we want to do that changes the dynamic of how they work.
That is awesome.
The ability for DevOps and operations engineers to have a magnificent impact on a business is there - given they have the opportunity. And I feel like that's one of the things that's so hard, getting that buy-in, getting the organization around what you're doing.
Yeah.
But give us the wings and we can fly. And I feel like it's so hard to get them sometimes.
So with these other orgs that were kind of coming along and saying sales org needs some help from you all or facilities - was the rest of the org hearing how you helped make the engineering team efficient and they were starting to come to you? Or is this still more like sales marketing outreach across the org?
50-50, I would say.
We have a really healthy dynamic in the executive leadership team and I'm a problem solver by nature. What's interesting in this is a lot of the larger companies that I worked for would ask me, “Why do you keep trying to solve these problems? All you deal with is problems.” And I actually love it. I don't know why I'm a glutton for punishment, but I love it. I love solving these hard problem and not accepting that statement of “We've always done it this way” as fact. Like, okay, but we could probably do it better.
When you open the concept of DevOps, and stop thinking about just infrastructure, it really is just reducing toil, right? If a human is doing something that doesn't really provide value, can we stop doing that? How do we free them up to do cooler things and innovate and spend time on other items.
When you open the concept of DevOps, and stop thinking about just infrastructure, it really is just reducing toil.
So when we were having these discussions about the business and what was happening and where teams were struggling, I would volunteer my team. I’d be like, I bet we can help you solve that problem. And from there, and a couple of really cool problem solves to unique items that they would never expect the cloud or platform team to be able to do, we became the go-to.
Now it's, “We have this thing and we're trying to figure it out. Jessica, can your team help?” And we do. It is a really awesome thing to see come to fruition.
I think it's hard to do, but again, like my entire reason for wanting to go to the startup world for the first time ever was, can I prove that we can do this? That you can really do it at a company in a healthy way, in a meaningful way. And it's been a really awesome journey.
That feels... I'm sorry, I guess I'm feeling it like by proxy, but like there's so many teams that I've seen outside of the engineering org where they need an engineering resource. It's like the sales team - we'd love it if we had this feature. We're an e-commerce site (or whatever site), but if you built this automation software for us, it'd make us more effective. And it's like, “Okay, well get in line for the software developers.”
Yeah, they never have time.
You've gotten to this flywheel point where you are literally saying, my ops people, which generally don't have the time to do their jobs in most orgs, have so much free time that we can help you too.
Yeah, because we automated everything. We invested so heavily.
When I joined, we didn't really have IT - you know, typical startup. My boss is a fantastic CTO and he was like, congratulations, you now have IT. And so we applied DevOps principles there. We automated everything. We automated onboarding, offboarding, help desk… so my IT team is two, that's it. We manage the entire company with two resources. And the cloud team is three, three plus the security engineer. But we do it all with a very, very small presence. And I think it's shown why investing very early in doing the right things and kind of figuring out what works for your organization will pay off immensely in the end.
Investing very early in doing the right things and kind of figuring out what works for your organization will pay off immensely in the end.
It's been probably the most rewarding job of my career. Reaching this point where we can say, “We don't need more people. We're okay.” And we're still innovating and we're doing all these things and we're solving all these problems with this little tiny team.
And my gut is… I'll let you answer, I won't go with my gut - your team's morale is probably through the roof.
Through the roof. I haven't lost one person in two years.
Yeah, that's amazing. That is amazing.
I feel like that's another thing that's so hard for many ops teams - you're constantly a little bit underwater. Like you get above and then you go below, you get above and you go below. That wear and tear on somebody - I feel like it's one of the things that drives us to have like an 18-month tenure, right? Like a 30-month tenure.
It's hard.
Rewarding aspects can be pretty far and few between for many operations engineers. Where they get that pat on the back of like, you actually did something for the business.
Part of that is on me. And I think this is where IT leaders in the platform space struggle too. My job is to pave the way for them. I call it block and tackle.
I block and tackle. They come up with these really cool ideas. I make sure they understand where we're trying to go as a business. And then I just block and tackle. I get people out of their way. That's my job.
We don't always win. We don't do every single thing that we want to do. But I think the first 8 to 12 months of really putting in the hard work to figure out what would work for our organization, our engineers - knowing that we wanted something that was easy to maintain, didn't take a lot of effort and engineers could participate in. And so we spent a lot of time ideating on that and failing and then winning and failing and winning. And it created such a positive culture on the team in knowing you can misstep. It's okay. We're all going to learn. We're going to grow from this. If you misstep, just like, how do we not do it again?
When you remove that fear of failure and allow your team to really ideate and solve problems, they thrive. They thrive. They love it. Actually, I joke with my cloud manager that he's part payroll ops because we're solving problems in that space now. You know, Senior DevOps Manager and Payroll Specialist. He hates when I say it, but it's true. We can solve problems there and we are.
Security and compliance is important everywhere, but particularly in healthcare. I feel like this is one of those things for a lot of orgs - maybe they have security people that are similar to many ops people that are on-prem, like they understand security principles, but they're not software developers. They might know how to run tools, use tools, but day-in-day-out they're not writing code.
I love seeing compliance and security be a part of people's platform, like a part of the product that's extended. How do you all think about security and compliance as a part of the platform that you're developing?
It starts at check-in, for us and the engineers. DevSecOps to me is a great concept. We knew really early on when we were building our pipelines that we wanted to think about security. Security is also very important to our organization and my CTO. And so we built it.
We found the right tools to meet us where we are, I think. There's a misunderstanding for startups that you need to go buy like the most expensive, well-known tool. And we've actually found a lot of our success in partnering with other startups in the DevOps space that are kind of ideating and learning.
So we had code scanning on check-in, we implemented security as part of our pipelines, we created… coming from big healthcare, if you're in DevOps audits are the most painful thing on the planet. No offense to the auditors out there, but Lord, you give me a headache.
[Cory laughs]
So we wanted to automate as much of the audit process and the evidence process as we could. So how do we do change management in the pipeline? How do we make sure that like we can check that audit box? And so we automated all of that too.
Minus the policies. We had to put all the policies in place, but we really worked hard to make sure that all of the policies focused on DevOps. We had the paper trails and the things that we needed at the end of the day to meet all of the requirements. And we, in March/April of this year achieved SOC compliance.
That's interesting, you said, right from check-in, like check-in of code, not check-in of patients. Well, maybe check-in of patients as well. But that investment right there of putting the code scanning tools in place, that's one of those things… it's tedious and it's annoying, but it's a time creator, right? That's one less person that's stopping to tap a security engineer. Or a security engineer finding out about something three months from now and then grinding that team to a halt to fix it, because now it's an emergency three months-in or the day after a breach.
Yeah, it's a culture change. I think you can decide that security is important. We implemented a certain project that required that we get SOC compliance. I've always come from larger companies that had it, and we were just doing renewals. So I took my pain from that and said, I don't want to do the same thing again.
I don't want to spend three months with my team running queries on people's access levels.
I don't want to spend three months with my team running queries on people's access levels. So how do we build things in a way where it's just a part of our day? It's not like, “Okay, the auditors have things. Let's upload the evidence.” We already have it. And so, it was a lot of learnings and a lot of the cultural shift of why is security so important from the get-go. It's not so much about having red tape. I think we try to avoid the red tape and make it as easy as possible.
Earlier you were saying that coming from the corporate IT world to the startup world, you were able to bring a lot of that knowledge from that side of the world. But the lessons can be very different between smaller organizations and corporate IT. What did you find at the larger corporations that you were able to apply and get benefit from at an earlier stage company?
I think that for me, it was knowing if we didn't have alignment in the beginning that we were never going to achieve success. You hear a lot in larger corporations like, “Oh, it's a grassroots effort to be DevOps.” Well, guess what? You're eventually going to hit the tape where they're going to say, “No way we're not spending money on that. We're not changing our process for that.”
Part of the concept of platform is empowering your engineers to use the platform. It's not good if they need your help all day long. So in large companies, it was very hard to get the investment of time from engineers. When they're working on other projects, they don't want you to interrupt timelines. Right?
Yeah.
Even in the startup world, you're moving super fast. So how do you build in what you're doing to kind of align with where the organization is going at the speed they're going. And I think that caused us to be very cognizant and very flexible. So yeah, we wanted to flip the switch on those freaking pipelines as soon as humanly possible and see what happens.
But we had to do a lot of give and take to build that trust to say, “If you don't feel ready or we feel like it's going to impact this project. Okay, how about we do it here?” And so we tried to be very giving in our timelines, but also ensure that the work that we were doing was in everyone's face. We were talking about it. We were talking about the benefits. We were putting numbers to it in what it meant to the end customer and what it meant to the company.
That was all from my learnings as a leader in larger companies. That if you don't put it in front of them and you're not talking about it and putting data to it and measuring it, they're not going to get excited about it. You're just in their way, right? You're a time suck.
If you don't put it in front of them and you're not talking about it and putting data to it and measuring it, they're not going to get excited about it.
Yeah, being able to market that information back to the bigger org to understand your impact. I feel like it's important, even if you aren't even starting to do platform engineering, if you're just like in the earliest stages of DevOps. That can help get that momentum, if the rest of the org is understanding it in their language.
I remember there was a job I had maybe a decade or so ago and the CEO one day came by my desk and he's like, I don't understand what you do here.
[Jessica Laughs] Nobody ever knows. It's like the Bobs from Office Space. What do you do here?
It's like, “You know how the site hasn't been down in a very long time? That's what I do. When it's broken, come and yell at me. I'm the one.” But I feel like, it's one of those things that's hard - where the operations engineers, and DevOps engineers are a mini org so removed from the rest of the org that it's hard to understand what they do.
It's hard to interface with them. Like not realizing that there is value that they can add to your team, right? But I think, in general, we don't do a great job as operations and DevOps engineers of kind of boasting to the org about how our work is making the engineers' products better.
You have to get them out there, to be honest. We held a lot of lunch and learns. We really tried to integrate with teams as much as we could, so that they understood that we were listening and, while a quiet force in our little corner, we were trying to solve the problems they were struggling with. And that takes time. It takes time to build that trust.
It's easier in small companies where the culture is still fairly moldable and changing, and you're kind of in this exciting, fast-paced energy in the organization. But you still have to do that work. That's why I say, I think 50% of our job is sales, right? Like, what can we do for you? Let me show you.
Once you get those projects out the door though, and people start seeing… and to your point, I had a CISO who said, never let a good security incident go to waste. And that stayed with me. We had an incident and that was the moment.
We had an incident where a login was broken on our platform, I think. And that's like, the world is on fire, right? Nobody can get in. And so we threw the cloud team on it, we had just implemented a real incident management process. And we recovered maybe in 30 minutes - which the prior time their login had gone down, it was something like 24 hours.
Oof.
Like really bad.
Oof.
And that was the moment where they were like, “Hey, you guys know what you're doing.” Some of that was because of infrastructure and logging we had put into place, that just allowed us to find the problem faster. And some of it is because I think when you're in SRE or platform engineering - I don't know what is in the DNA of people that do this. They're insane problem solvers. Like they walk this little path like they're on their journey to gold and they can find things that engineers don't see. Engineers are looking at the code and in cloud and platform, they're just like, “Where's the break in the matrix?”
I don't know what is in the DNA of people that do this. They're insane problem solvers.
Yeah.
They can find it so much faster. When you focus on things like recovery versus whose fault is it and all those things, the benefit of what we do starts to show. And then we fix it so that break in the matrix doesn't come back again.
When the stability of the platform increased and we really started seeing the time to value increase, everybody was like, “Hey, I want to go faster. Can you help?” And we do. It's great.
So there's a lot of tools in this space - I feel like in engineering in general, there's just always tools coming at you. But especially in the cloud native space - just hopping on and taking a gander at the CNCF landscape will make you dizzy.
Your team is obviously very efficient, like to the point that they're helping teams outside of engineering. How are they looking at new tools coming into the space? Are you proactively looking for new tools to solve problems or do you only look as new problems arise?
I think we're always looking. I think DevOps and platform engineering is a curious space. I say this to a lot of startups and I think it applies to big corporate as well - meet yourself where you are. Don't buy the most expensive tool because you have decided that's a really cool thing that you want to do, or some sales guy came and talked to you. If you don't know that you're going to use it, you don't know how to use it or integrate it into your world - do a little dance with a low-cost budget that doesn't hurt you if it doesn't work.
I think there are so many similar tools in this world. There are some that do it really well that might not be in your budget, and that's okay. You can find one that's not… like I said, for our observability, we partnered with another startup. It's been one of the most beneficial because they were growing, we were growing, and so we just solved problems together. And it was a really low-cost way to improve our insights into our platform.
Yeah, I think if you do platform engineering and you truly are making a product that's abstracting some of the stuff away, it also gives you a little bit of room to experiment or A-B test without necessarily having to chase down 45 pipelines to change something.
Yeah, absolutely. When you get to that kind of healthy stage where things are automated and you're not doing a lot of like day-to-day tasks, my guys have a lot of time to innovate and play around and see what the next thing is that we should be doing. What's the next efficiency we're going to tackle? And in that comes, potentially, new tools. So we try to keep an open mind and a low budget with what we're looking at.
With your engineers using the platform, what is your platform team looking at day-in-day-out? Not necessarily the tool per se, but like what types of numbers and metrics are you looking at to tell how effective that platform team is continuing to be? Do you have KPIs that they're aiming for or is it more on the developer happiness side?
It's both. For us, it focuses somewhat around DevOps metrics - so time to value, mean time to release. And then obviously how much are we rolling back those releases? We also, because I own quality engineering, are looking at how many of our pipeline failures are because of a bad test or bad code.
That's a really important metric to look at - are we finding issues (like we're stopping releases) because we actually have a problem or is it the test that is flaky? Because that's a failure on us too. So we tried to look around at the clear metrics that a lot of DevOps folks follow - how many deployments we have, how fast are our new features getting out there. But also like for our team - How many of those problems are infrastructure problems? How many of the problems are test problems? Because those are things that we should be evaluating - we're not doing something right if a majority of the challenges are in that. And then, of course, around incident management and recovery.
Time to recover is one of our biggest metrics, and it's been one of our biggest improvements. A lot of that is from like the logging, the stability and resiliency of the platform - having multi AZ databases and things like that. Again, while still trying to stay very budget-friendly.
So with your platform, what is your developer experience like? What does their interface to your platform look like, I guess?
I mean, it's kind of every day - we're a startup. We tried to build our modules and tools so that they were easy to maintain and readily available. So everything is done in Terraform. The infrastructure is code so they're repeatable patterns that engineers can kind of plug and play.
Anytime we're testing out a new concept, we engage them very early. So we build the concept, we give it to them to play with and give us feedback. So by the time we're really rolling something out to the wild, they've already used it so much and we've taken their feedback into consideration.
Right now we're doing Kubernetes because we wanted to be able to spin up an isolated ecosystem for engineers to test. And that really came from their frustration. I think most of our ideas come from asking them, “What's holding you back? What's frustrating you? What's slowing you down?” And then going and solving that problem. And so we built a Kubernetes ecosystem where they can just deploy the entirety of our app stack and test in a little isolated bubble, and they love it.
That is great. That is one of those things. Having a monolith, very nice. Having microservices or services, very nice. Testing microservices and services, not very nice. [laughs]
We're somewhere in the middle. We have like microservices and a ginormous repository. It's like 50-50. So we're still like slowly chipping away at that problem. But in the interim, whatever we can make easier for them… you know, having test suites available in their local environment, giving them access - go run it, go ahead, let us know if something doesn't work. And so it's really just focused on enablement and feedback loops.
I think that is the crux of DevOps and platform engineering, it's enablement and feedback loops.
Enablement and feedback loops. And you started that from the beginning. Like from your surveys - you were giving feedback to developers, getting them to incorporate it.
When starting to get into platform engineering, people are always like, “What's the first step?” Talk to your developers and see what they want. And it's just like, “We sent them a survey, and they don't like filling it out.” Because we never talked to them about the feedback they gave us.
But that engagement, that's what gets them excited, lets them know that they're being heard.
It is. And I think a lot of early platform engineering teams (again, I learned this from my big corporate experiences), they were built in a bubble. Like we're going to put you in a castle over here with like security at the door, and you can't get in, and you're going to ideate, you're going to build this whole thing. And then people are just going to use it. Like you have to force it. And it was the wrong way.
It was the wrong way to operate, right? You're building something in a vacuum without engaging the people that have the challenge. And that doesn't work in principle if you're talking about large adoption.
Anybody, us included, right? If your boss came to you and said, “You're going to do it this way. And I built it and that's it.” You'd be like, “This guy's an idiot. Like, what are we doing? That doesn't meet anything that I need. My team's not getting what they need.”
That feedback loop and lack of ego is really important in a cloud team. And our engineers contribute - they contribute to our repos, they bring us ideas, they want to partake in it now. And I think that is a huge sign of a healthy platform in DevOps culture.
Yeah.
We have engagement. We may not always do it exactly how we want, but that's that middle ground that you come to to continue that healthy culture and adoption.
Yeah, I really like that.
Something I would love to know is, particularly in healthcare where you have so much data, how is your team thinking about the AI tools and AI services coming to market? Is it something that you're looking to incorporate into the platform for your developers? Are you starting to look at it for just yourselves?
Yeah, we are. we, yeah. Chat-GPT is probably the name that everyone goes to.
I own security and I was like, “Oh man, here we go.” Like, how do you build boundaries around something like this? Like what data are they putting in? Do we monitor all the data? But then that's going against every DevOps and enablement principle you have. And so it's a balance.
It's not a matter of if, it's more a matter of when and how fast can you recover. We can't put ourselves in a bubble anymore.
I think it's a balance that's going to unearth some interesting finds in the next couple of years as people incorporate more AI.
So yes, we are using… you know, like GitHub has Copilot. Copilot does a lot of menial tasks when it comes to coding and filling in the gaps. That's a really cool thing. Saves time. We are getting ready to implement a chatbot in our app as an option of support. And I'm really excited about that.
We want to build it in a platform way that is not like many of the big companies. My goal for the team was - don't make it something that people are going to want to throw their phone out the window. We've all had that experience with a chatbot where it's like, how do you get a human to talk to you because I'm getting nothing here.
Mmm Hmm
I think what we're doing in that space is really unique and it's customer-centric, in the hopes that it helps other teams in not having to answer the same question all the time.
I think security and AI is going to be the big thing for the next few years. How do you utilize that technology and protect your platform, your applications? I don't think we have all the answers yet. But as I said, one good security incident, it'll show you the way and we'll figure it out. So I think there's still going to be a lot to learn in that space.
There's opportunity in security incidents. [laughs]
I think these days it's not a matter of if, it's more a matter of when and how fast can you recover. We can't put ourselves in a bubble anymore. The technology is too broad, but if you have good monitoring and good incident practice, you can recover very fast. And so we focus on that more than - How do we never allow this into our ecosystem?
That's kept innovation at the forefront. While AI is a different space that I think we have a lot to learn, it's a little scary especially for larger companies. I think we're able to embrace it knowing we've put the foundation down to manage it as a new concept.
Very cool.
I'd love to ask you one last question. You have this unique experience where you've been a part of the huge corporate IT machine. You've come to a startup where you've absolutely shown in building out a platform team. I wouldn't say if you went back to corporate [laughs], but for folks that are in… it's so hard to go back.
Yeah, it is.
But for the folks that are there… maybe they're on a DevOps team or they're on an old school ops team and they're trying to move and become more efficient… What would you say is the one thing that you've learned from being at a more agile startup environment that you think you could really use to push a DevOps team forward?
Don't stop trying. I think that you need to acknowledge the level of sales that is included in this. And it really goes to aligning DevOps and platform principles with your company goals. And when you can start to marry those two and talk in a language that will help folks understand what you're trying to build… don't show them an ecosystem. Don't do an architecture diagram of the cloud. Nobody is going to understand what you're talking about.
One of my former colleagues and I had a presentation and it had a giant floating duck. Then we showed the giant floating duck on fire and we were like, “This is where we're going. We can help.” Honestly, I still have the presentation. It's one of my favorite slides ever because we had tried to pitch it so many times in so many ways. And finally we were like, “You know what? This is where we are. This is where you're going if we don't change.” And it is what won. Honest to God, it's what won. We still talk about that duck all the time.
Heck yeah, light those rubber ducks on fire. I love it.
Well, thanks so much for coming on the show. I really appreciate the time. Where can people find you on the internet?
I'm on LinkedIn. I think that's probably my most active when it comes to work things. So LinkedIn is my general way - I have blue hair on my LinkedIn profile. It is my norm when the colder months come and I put it back in.
Awesome. Well, thanks so much for the time today. I really appreciate it.
If you haven't got to check out the series on the History of Cloud Operations, I had a four-part series earlier this year where I got to speak with Mark Burgess of CFEngine, Adam Jacob of Chef, Mitchell Hashimoto of HashiCorp and Terraform, and Brian Grant of Kubernetes. So definitely a great series to check out.
Thanks so much for your time. Have a great day.