Before coming to BigPanda and migrating my career to becoming a Solutions Architect, I managed Platform and Cloud Engineering teams in the Financial Tech (FinTech) space.
One of the FinTech companies I was at wanted to be as fast, agile, and as DevOps-focused as possible. However, it was anchored by regulations, a strong commitment to not changing what wasn’t broken, and an understandable fear of transforming digitally.
Given that context, leading the company’s introduction to modernized tooling, and its first major release in the cloud, is a journey I’m very proud of to this day. But, because of how our NOC was set up and how it worked, those successes led to some unexpected issues.
Always-busy L2s and L3s
In the NOC, the standard process of maintenance and troubleshooting was to have the team watch 10+ monitors full of data from Zabbix, CloudWatch, AppDynamics, SolarWinds, OEM, Splunk, and VCenter.
The L1 operators were often entry-level and had little to no chance of making sense of the constant onslaught of data. This forced them to frequently make Serena and/or Jira tickets and page out to L2 and L3 support…which meant that our L2s and L3s (senior engineers) were having to constantly troubleshoot and put out fires, taking time away from what they really needed to work on.
Now we started to ask ourselves: “How do we continue to grow now that all of our Senior Engineers are so busy supporting our new products on our new platform?”
Someone suggested that we needed SREs. “SREs everywhere!”, we heard. The SRE answer was based on needing full stack engineers who could understand networking, storage, cloud, containerization, and infrastructure, as code. If an engineer could understand all of those things, s/he would be able to troubleshoot and maintain our newest application that was moving over 25 billion dollars a quarter, through the US financial system.
But good SREs are very hard to find, and they are very expensive, so it wasn’t a very good answer!
I knew I couldn’t find the SREs we wanted or find the L2/L3 headcount/budget we needed, so I began to look for ways to solve the very real and very painful problem of exploding alert noise.
That’s when I found BigPanda.
I realized that BigPanda could collect, normalize, de-duplicate and correlate alert data from all our monitoring tools and present them to the NOC in a single pane of glass. The thousands of alerts per day (well… probably more if we hadn’t essentially turned off AppDynamics alerting) would be correlated into maybe 25 incidents. Those incidents could then be triaged appropriately and automatically or better yet, solved by the NOC team via direct links to runbooks or webhooks to remediation scripts.
It was obvious to me that BigPanda could not only solve the operational support issues affecting my new FinTech platform, but also do the same for our legacy applications.
Enterprises have ready access to a number of good, powerful monitoring tools today. By design, these tools generate a staggering amount of alert data. Left ‘untreated’, this alert data can easily overwhelm most NOC operators and diminish the ROI from these tools.
BigPanda can make a real difference in that situation by consolidating alerts from various monitoring tools, reducing the noise, correlating alerts into actionable incidents and most importantly, helping enterprises reduce outages and the durations of any outages that do occur.
From a team standpoint, BigPanda will enable your engineers to be engineers and your NOC teams to fully support your new projects, services and applications. With BigPanda your production pipeline will thrive again!
Once I realized what BigPanda was capable of and how transformational it could be, I wanted to share it with the larger technical community…which is how I ended up as a Solution Architect role at BigPanda today. Having worked closely with NOC Managers and IT Ops and understanding their daily pain, it’s quite fulfilling to help them understand the very real and transformative difference BigPanda can make in their lives and their teams’ lives!