How to untangle monitoring noise and leverage observability best practices
Most organizations suffer from some form of alert noise, shares Adam Blau, senior director of product marketing at BigPanda. “Alert noise is only going to increase as organizations support cloud-native applications spanning multiple public and private clouds, including ephemeral deployments and more. It’s not going to get easier for organizations to understand the signal from all those alerts being sent,” Blau said. So what can be done about it?
In this on-demand session from Pandapalooza, Blau and BigPanda’s Chief Technology Officer Jason Walker discuss how we got to this worrisome level of unhelpful alert noise and what organizations can do about it.
How did we get here?
Blau describes a few reasons why most organizations are dealing with unmanageable levels of low-quality alert noise. Part of the problem is that “it’s no one’s job to monitor alert quality,” he said. There is no feedback loop. “It gets painful because the NOC is receiving all these events and alerts, and there’s no way to go back [to the monitoring team] to tell them what is actionable and what is just noise.”
The impact of this problem is that the proportion of noise to signal increases each year. And for some organizations, only 1% of all alerts are actionable. This is where BigPanda can help.
BigPanda’s AIOps Event Correlation and Automation platform helps organizations overcome the gap between monitoring and network operating center (NOC) teams to create a feedback loop to understand how data is being ingested. It enables organizations to take action, better understand alert quality and make data-driven decisions about how to optimize monitoring tools.
In the full session, Blau details how to take advantage of Data Engineering and Unified Analytics. The highlights include:
- Measuring events with operator outcomes
- Properly classifying alerts into low and medium quality
- Using enrichment to enable technical and business context to drive actionable, high-quality alerts
The goal is to prevent users from having to filter and sort through low- and medium-quality alerts so they can focus on meaningful alerts.
Observability best practices
Walker stepped in to offer practical tips and best practices in the domain of monitoring quality and how to start it:
- Best practice 1: Focus on the domain within your control. Concentrate on obvious, low-quality alerts where you have high levels of technical and business understanding in order to connect the dots across your organization. These obvious, low-quality alerts should be on your hit list so you can eliminate them from the operations pipeline as early as possible.
- Best practice 2: Be guided by business context. Enrich alerts with real-world business context that has been defined, reviewed and agreed-upon by multiple teams.
- Best practice 3: Drive effectiveness with cross-functionality review processes. Directly involving stakeholders helps establish a culture of ownership and commitment to improving alert and incident quality. Communication among teams about what percentage of alerts are noise and what are meaningful promotes an overall healthier pipeline.
- Best practice 4: Prioritize monitoring quality hygiene. “You have to guard this jealously,” said Jason. “You have to have something in place that tells you what your monitoring quality is and maintains the integrity of your pipeline.”
To learn more about these best practices and how BigPanda’s AIOps platform can help streamline the flood of alert noise in your organization, check out the full session here.