|

BigPanda Event Enrichment Engine: The secret ingredient for AIOps

7 min read
Time Indicator

James Beard, the pioneer of television cooking shows, once asked, “Where would we be without salt?”. Salt is often underrated, even though it is the ingredient that has the greatest impact on food and flavor in the modern world. It has its own taste, but also balances and enhances the flavor of other ingredients. Salt boosts sweetness and blocks bitterness, it has scientifically proven capabilities to intensify flavor compounds that are too subtle to detect (i.e. “bringing out the taste in food”) and it makes meat juicier by unravelling its strands of protein and allowing them to sop up liquids. 

I bet you didn’t know this, and maybe don’t really care now that you do… but you should. Because, as you now understand, salt has the biggest impact on what sustains you and keeps you going – your  food.

Which brings us to event enrichment. 

Event enrichment has an outsized impact on what helps modern IT Ops organizations like yours succeed –  your AIOps solution. Often overlooked as well, and generally misunderstood, event enrichment is a key ingredient in successful IT Ops in general, and AIOps in particular. Here’s why.

 

Structure as the first step

It’s been said dozens of times before – so one more time won’t hurt: Today’s IT Ops teams are drowning in a sea of alerts emanating from their many monitoring and observability tools. But it’s not just their number that’s an issue – it’s also that they lack context. Bombarded with tens or hundreds of thousands of context-less alerts, many IT Ops teams have reached the point where they’ve effectively “shut down ” their monitoring: They’ve stopped using their monitoring tools for proactive monitoring, rather only going back to them after-the-fact, and using them as a diagnostic tool when something breaks and they’ve been notified. 

For many of them, the way out of this undesirable scenario, and the path back to proactive monitoring, is through enrichment.  

By adding  context to events, teams can organize them by their relationships: the app they’re related to, the server they are running on, the business or service being affected, and more. This is key to dealing with the overwhelming alert volumes common in IT operations, as operators can now look at groups of related alerts. 

It is also a first step towards succeeding with AIOps.

 

Correlation and root cause analysis

Adding structure helps understand  the basic context of events, but does not assist in dealing with the overflow in their numbers, and in surfacing root cause. For teams to be able to effectively detect and resolve incidents, AIOps tools have to correlate the tens or hundreds of thousands of alerts coming in into a small number of high-quality, actionable incidents. 

The ability of AI/ML to detect correlation patterns and “compress” alerts relies heavily on the quality of the data fed to it. Context-less data leads to limited, low-quality incidents as a result of weak correlation. Enriching alerts collected from all aspects of your IT stack or technology domains – aka cross-domain enrichment – supplies the AI/ML algorithms with the information they need to correlate alerts with a high degree of efficiency, effectively reducing IT noise to humanely-acceptable levels.

In a similar fashion, successful root cause analysis relies on the ability to understand and leverage the different dependencies between infrastructure and application components in modern environments. Some of this information is buried in incoming alert streams, and some of this information is contained in external data sources such as asset and inventory management systems, orchestration tools, APM service or flow maps, CMDBs and more. Cross-domain enrichment adds this much-needed context. In doing so – it allows the surfacing of infrastructure related root-cause, and it helps match incidents to problem changes (aka root cause changes) that are causing incidents and outages. 

 

Automation

Now that you’ve identified and surfaced the root cause of an incident, you need to respond to it, remediate it, and eventually resolve it. 

Enrichment drives value here as well: it ensures the creation of incidents whose payload data consists of all the right topological, operational and other contextual data needed to drive downstream automations and workflows – both inside the AIOps tools, and inside the collaboration tools integrated with them. For example – using the enrichment data within the incident to assist in opening ServiceNow tickets, prioritizing them, assigning them to specific teams, notifying teams via Pagerduty, triggering runbooks on Rundeck and more. 

 

Cross-domain enrichment – a must for cloud and hybrid environments

Now that we’ve understood why enrichment is an essential ingredient for successful AIOps, let’s discuss why cross-domain enrichment is absolutely necessary for cloud and hybrid environments. 

Many teams that operate in today’s modern environments find that due to their dynamic nature, their CMDB is essentially useless. If a host name consists of a 16-digit combination of letters and numbers, and changes every few hours or minutes, trying to use a CMDB to connect a service to a CI is futile. CMDBs are helpful in static or slow-changing,  on-prem environments. But in modern hybrid and cloud environments they are almost always out-of-date. That’s why some teams don’t even bother to build or maintain one these days.

That’s where cross-domain enrichment can be extremely helpful. 

With cross-domain enrichment, enrichment data from any and all observability, monitoring, topology and other operational tools is brought into your AIOps tool. As previously mentioned, this includes enrichment data from inventory and asset management systems, cloud orchestration systems, APM/network maps and more (including, potentially, any CMDBs still in use.) Their data is used to enrich the alerts in real time. Complex dependencies can now be easily understood by looking at alerts, for example,  to determine which microservices-based applications are associated with which systems, or what is the container-based application ID that is associated with an alert, and so on. 

 

The BigPanda Event Enrichment Engine

Now that we’ve understood the importance of enrichment, it’s time to discuss BigPanda’s Event Enrichment Engine.

BigPanda employs a best-in-class, cross-domain Event Enrichment Engine, which is coupled with the BigPanda platform’s Open Integration Hub and Open Box Machine Learning (OBML), to provide a market-leading AIOps-driven Event Correlation and Automation platform.

BigPanda’s Open Integration Hub ingests the three datasets critical to enrichment – and by extension, to successful AIOps solutions:

  • “Raw” streaming alert data from all enterprise observability and monitoring tools, 
  • Operational and topological context from all relevant sources, and 
  • Change data from all change feeds in the environment. 

This ensures that BigPanda’s Event Enrichment Engine has access to all the needed information for cross-domain enrichment.

BigPanda’s Event Enrichment Engine now enriches the ingested alerts by manipulating contextual data that’s buried in the incoming alert stream, and by adding new topological and operational data collected from the other sources and tools.  This is done through user-defined enrichment logic that can perform millions of enrichment actions every day – allowing you to  scale your operations as needed.

BigPanda’s Event Enrichment Engine also offers a purpose-built, intuitive user interface that helps you create and maintain this enrichment logic, with the unique ability to preview the results of the enrichment logic before going live.

Composition

It also supports advanced enrichment use-cases, such as user-defined ordering of enrichment logic, powerful configuration APIs and more.

Rule-Reordering

Post-enrichment, enriched alerts are passed on to BigPanda’s OBML which leverages the added context to correlate the alerts into high-quality incidents, reducing IT noise by over 95%. BigPanda’s OBML also uses the enriched data to surface probable root cause – including root cause changes. Finally, the enrichment data inside incidents is used to trigger workflow automations to help manage and remediate the incident.

As you can see, enrichment in general, and BigPanda’s Event Enrichment Engine in particular, has a significant impact on the entire incident management lifecycle, shortening, enhancing and/or increasing the quality of data in each stage, so that, human IT Ops teams are able to detect and resolve incidents and outages faster than ever before. In other words, AIOps can’t succeed without enrichment. It truly is the secret ingredient of successful AIOps.

To learn more about enrichment and BigPanda’s Event Enrichment Engine – please visit our Event Enrichment Engine product page.