DevOps/SRE model: Bursting the developer’s bubble

Date: October 7, 2020

Author: BigPanda

Many organizations are transitioning toward a DevOps operational model, where software developers are responsible for operating the applications they develop, instead of a centralized IT operations group.

In this “CTO Perspective” interview we talk to BigPanda’s CTO Elik Eizenberg about the challenges in that transition, and what it takes to make it easier.

Read the skinny for a brief summary, then either lean back and watch the interview, or if you prefer to continue reading, take a few minutes to read the transcript. It’s been lightly edited to make it easy for you to consume it. Enjoy!

The Skinny

DevOps is an important operational model that drives velocity and accountability. Essentially it means that developers are responsible for operating the applications they develop instead of IT Ops teams. The problem is that it’s easier said than done – teams don’t really know what it means to operate applications in a production environment because they don’t have transparency across the stack, they somewhat “live in a bubble”. To be able to implement this model successfully – you need to burst this bubble by:

Introducing transparency & visibility: Let app developers see the work that is being done around their applications. That world affects the operation of their application just as much as the code they write.
Providing the proper tools. If developers need to take care of uptime for their systems, they need to be provided with well-configured monitoring, logging and collaboration systems – otherwise they will resist adoption. That is why many companies are establishing SRE or Platform teams that are focused on such tooling.

And how do we go about doing this? Watch the interview to find out.

The Interview

The Transcript

Yoram: Hello and welcome to the CTO Perspectives, where we discuss unique perspectives about the most current issues in IT operations. And here to speak with us today once again is Elik Eizenberg, Chief Technology Officer and co-founder at Big Panda.

Elik: Hi Yoram. Really Good to be here.

Yoram: It’s always such a pleasure and so insightful speaking with you. Today we’re going to be talking about a topic that is top of mind for many organizations, and that is the DevOps/SRE model for IT operations. It’s modern, it’s innovative, and it’s driving a lot of success for many companies like Google. That is why many companies that consider themselves modern are trying to implement it, only to find out once they’ve started doing so that it’s not as simple as they initially thought it would be. Isn’t that right?

Elik: You’re spot on. There are a few companies out there that already are fully transitioned to an SRE/DevOps model using SRE teams, which provides them with huge business success. So it’s not surprising to see many organizations out there trying to adopt similar models. But it is a very long journey.

I can tell you from my personal background: I served in the Israeli Intelligence as part of an application team, and then at a certain point I transitioned to own an IT operations department. The difference between those two environments was night and day. To get to a point where developers can really own the operation of their applications and start thinking as IT operations personnel require a very big shift. So I understand why companies are struggling to get there and why it takes a long time.

Yoram: Let’s talk about that difference of night and day. What is the experience of a traditional developer? What’s his environment? What is he working with? What does he know?

Elik: The application developer’s focus is around creating the code. That’s what actually runs the application. And their weapon of choice is always the IDE. They log into the system, open their IDE, write their code, add testing, sometimes they do some compiling. They build a tool, they build the technology. But once it’s ready, they take that built code base and they hand it over to the IT operations team to actually deploy and operate. And that’s where their job is done. They just own the code, the IDE. And those are the boundaries of their world.

Yoram: So now, when moving to a DevOps/SRE model, these developers are asked to also support and operate the application that they developed. And what are they experiencing now? What is it that they’re seeing, that they haven’t seen before?

Elik: They’ve built the code and they’re ready to run it. Now it turns out that there are a lot of other components to think about. There’s actually a big ecosystem that surrounds your code. First of all, your code runs within an operating system , and that operating system is often hosted in a hypervisor, or VM. It also may run in a physical infrastructure server, which is connected to network devices and storage devices. And all of that runs within the data center. There’s a very big ecosystem of different components that can affect the performance and uptime of the application, even more so than the quality of code that they authored. The other aspect is around how to deploy the code. How do you make sure it performs? How do you make sure all the changes that were done do not cause issues and downtime for your users? These are also very complex processes that you have to start thinking about. Going back to our application developers that until now just owned the code within the IDE, but now understand that they have to expand their vision to all of these different moving items, you realize that it’s going to be a really long uphill battle to get there.

Yoram: It’s pretty overwhelming, isn’t it?

Elik: It is. It’s a lot of information.

Yoram: So basically, in the traditional model, developers are in a way, cushioned. They’re sort in a bubble, so to speak. And in order for them to be able to move to the DevOps/SRE model, they need to expand their scope. They need to burst their bubble, if you will.

Elik: You know, I embrace that terminology of bursting the bubble. I think it doesn’t say anything bad about developers, authoring code is hard enough as it is.

Realistically, if you want our application developers to start thinking about their code operationally, how to operationalize their code base, you just have to burst the bubble. You have to get them aware and thinking about all these moving parts in this ecosystem that surrounds their code base.

Yoram: And I’m assuming that since it’s a big move and it’s a bit overwhelming, there are a few things you have to do so people will want to do it or be successful in doing it. What do we need to do in order to burst this bubble?

Elik: My advice for a lot of our customers usually centers around two specific things. One is thinking about transparency and how to gradually get your developers to start thinking about this full ecosystem. The key there is to actually expose them to signals from many different tiers of the stack. You show them alerts from your network, alerts from Vrealize from VMWare, from their cloud, from their storage and so on. You also have to start showing them user alerts. You gradually start showing them the entire ecosystem in which their code runs. The second bit of advice is around the tools developers need. One of the main things that allows them to be efficient and to embrace new process changes, is good tools. Start giving them access to logging tools, monitoring tools and collaboration tools for incident response. You really have to give them all the different tools to support that DevOps transition.

Sometimes, when companies look for shortcuts and want to avoid having people embrace new tools, they immediately implement new processes without providing the right tools and building the needed infrastructure to do that, and it doesn’t work. I think companies that are establishing SRE teams or platform teams to build those tools are doing the right thing.

Yoram: So let’s get a little bit into the mechanics of how you create that visibility. What it is that you need to bring in into that single pane of glass that we discussed earlier?

Elik: The first piece is around connecting to and displaying monitoring data, topology data and change data in one place. It’s about making sure that across the stack – networking, storage, infrastructure, application, logging, user monitoring – all of that data is collected into one system, a single pane of glass. But you also have to think about changes, because changes are a big aspect of understanding how your system works and why it fails. So we need to collect all of our CI/CD and change management signals into the same system, and finally topology information about different dependencies within our infrastructure, across the stack from the network up to the user, all of that information has to be analyzed and displayed within a single pane of glass.

Yoram: So a lot of information. Right. But but I think we discussed this earlier: It’s a lot of information, but you have to make sure that you’re only seeing what’s relevant to you.

Elik: Spot on. When you think about a traditional IT operations team, their view is very much horizontal. They see all the operational data across all of the different teams and business units in their environment. And that results in, without exaggeration, in hundreds of thousands, if not millions of different events and signals that they have to track.

When you think about a DevOps model, when a single developer needs to take ownership of his or her application, it’s very important to make sure they only see what they care about, what affects their application. You cannot expose them to this ocean of signals. You have to make sure there’s a way to show them all the signals across the stack – but in a very limited, narrow view of their own application.

Yoram: Otherwise, it’s too overwhelming.

Elik: Right.

Yoram: It’s overwhelming as enough to move to this model without having to deal with things that are not related to you.

Elik: Exactly.

Yoram: So I’m assuming that this is a strategy that BigPanda implements into its product.

Elik: Exactly. We have essentially two components to support that vision. The first one is what we call the Open Integration Hub. This is all around providing very good integration into all of your alert, change and topology tools. So within a day of work, sometimes within an hour of work, you can connect all of your data source to one pane of glass. BigPanda will immediately expose all of that information to any single person in your company that needs access to all that information. The second aspect is what we call environments, and that is exactly what allows you to expose just a narrow vision of all that information. We allow you to create an environment and say: “this is what should be part of this custom view, only the following applications and services”. And the combination of that – the access to all of the information coupled with the ability to narrow down that information to what affects the individual developer – is what allows you to transition towards DevOps with BigPanda.

Yoram: So what’s your experience with BigPanda customers, or what is BigPanda’s experience with its customers using the BigPanda platform to help push forward the DevOps/SRE model? Do we see higher rates of adoption?

Elik: Definitely. I’ll give you very two quick examples. One is a very big online travel company that we work with. They have this huge vision around moving towards DevOps and they actually brought in Big Panda to enable that transition as one of the driving forces. We work closely with their SRE teams to allow that. An even better example maybe on that started almost three years ago with a very big SaaS company, that initially brought BigPanda in to support their NOC, a part of their operations team. They deployed BigPanda, and had about 30 or 40 users using BigPanda on a daily basis for visibility across the stack. At some point in the last two years, they started transitioning to DevOps and they realized the BigPanda was actually an opportunity. They started gradually giving BigPanda access to different application teams, and they they did that by giving them kind a narrow view, an environment view of their stack. Fast forward to today, two years later, we have more than a thousand users using Big Panda on a weekly basis across all of that company. It really shows you how BigPanda can be an enabler of the transition towards a DevOps model.

Yoram: So 30 times more people in different teams and different applications, different environments. That’s pretty amazing. It always amazes me that if you set out to develop a tool that comes out from a specific real need, and then have that tool provide practical solutions to that need, it actually makes a difference.

Elik: I agree. I have to say that I’m very proud of that aspect of our product, the fact that companies are trying to transition towards a DevOps model that is going to drive them a lot of business success. They’re going to be more agile. They’re going to have more accountability in the organization. But it’s really a transformational transition that’s very hard to implement. And knowing that BigPanda can help you in the journey, that is something that I really enjoy about our product.

Yoram: Thanks so much. As I said in the beginning, this has been so very insightful! Thank you so much for talking to me.

Elik: Thank you, Yoram. This was fun.

Yoram: And if you want to see more CTO perspectives or learn about the BigPanda platform, please visit us at BigPanda.io See you next time!

DevOps/SRE model: Bursting the developer’s bubble

DevOps/SRE model: Bursting the developer’s bubble

The Skinny

The Interview

The Transcript

May Also Interest You

Root Cause Changes: are they the “Elephant in the NOC?”

IT Ops tax: Death by a thousand cuts

The unattainable promised land of tool consolidation