BigPanda’s Event Enrichment Engine
Cross-domain alert enrichment for optimal AIOps-powered event correlation and automation
Success of AIOps tools, including AIOps-powered event correlation and automation tools, relies heavily on the quality of data fed to their AI/ML algorithms.
In highly complex and dynamic IT environments, this means cleansing, preparing and enriching the alert data payload with rich operational and topological context, which it often lacks, and doing so at scale. Without this enrichment, AIOps-driven tools are limited in their ability to eliminate IT noise, surface the root cause of problems, and automate manual incident management tasks.
“BigPanda’s Event Enrichment Engine helps structure our event data and adds more context to alerts for our Operations team to triage issues quickly and achieve better alert compression rates.”
Senior Digital Enterprise Monitoring Services Manager
BigPanda’s Event Enrichment Engine offers cross-domain
enrichment capabilities at scale to assure AIOps success.
BigPanda ingests “raw” streaming alert data from all observability and monitoring tools, regardless of the vendor and monitoring domain. This is achieved through its Open Integration Hub, which normalizes the data into key-value pairs, or tags. At the same time, the platform ingests operational and contextual data from asset and inventory systems as well as topology data from available CMDBs, APM and orchestration tools.
Normalized alerts are now fed along with the contextual data to the Event Enrichment Engine, which combines them to produce context-rich alerts ready for correlation, using multiple programmatic techniques:
Extracting values hidden in the alert and/or contained in contextual data, and adding it as a new enrichment data (tag) to the alert. For example: extracting the name of a cluster from the alert’s host data, and adding it as a “cluster” tag to the alert.
Composing several alert and contextual data values into a new value, and adding it as a new enrichment tag to the alert. For example: combining an alert’s service name and alert check to create a specific runbook url, and adding it as a “runbook_url” tag to the alert.
Extracting the relevant context from multi-column enrichment maps obtained from external sources. For example: using the service name to look up its owner and data center location and adding these two values as new tags to an existing alert.
Enrichment tag values can be calculated by using one of the actions listed above, or by combining and sequencing multiple ones, also using other computed enrichment tags as inputs. The creation of the enrichment logic and its execution order are all available in the Event Enrichment Engine intuitive user interface.
Once BigPanda’s Event Enrichment Engine processes and enriches alerts, they contain all the necessary context and data for AIOps-driven:
- Correlation and noise reduction
- Probable root cause determination with a high degree of accuracy
- Triggering of workflow automations
Main features of the BigPanda Event Enrichment Engine
Comprehensive logic, with previews
BigPanda’s Event Enrichment Engine supports comprehensive enrichment logic via regex operators. This logic can be easily defined by users and used to extract and manipulate relevant alert payload data to create enrichment tags, which are then added back to the alerts. This entire process is available through an intuitive UI, and the results can be previewed before being implemented.
User-defined enrichment logic can be implemented using several techniques:
Using the extraction tag creation process, users can extract important metadata from existing payload tag values and create new enrichment tags. For example, a hostname payload tag often includes several key pieces of information, such as the alert’s service, node, cluster, datacenter and domain. Each of these pieces of context can be extracted into their own tag.
Using the composition tag creation process, users can create new tags by combining the values from any number of existing tags and/or text strings. For example, a runbook URL tag can be created by combining the value of a company’s base wiki URL, with the value of an alert’s cluster tag and its check tag.
BigPanda’s Event Enrichment Engine also supports enrichment tag creation through mapping (available in an upcoming release), for use in situations where the logic is complex or there is a need to incorporate external information sources such as a CMDB, a registry or an operational spreadsheet.
For example, users can enrich alerts with the values of the responsible team and relevant runbooks, by using a lookup table that returns these values based on the alert’s application tag value. The Engine’s intuitive UI allows users to upload and display the table, and then easily choose the relevant values from within the tag creation screen.
Multi-type enrichment logic and tag execution ordering
In addition to creating enrichment tags using one of the techniques previously listed, BigPanda’s Event Enrichment Engine lets users combine a series of consecutive enrichment actions.
For example, a multi-type tag can be created by first extracting a value, then combining it with another value, and then using mapping to look up its final value based on this combination. The ordering of this interdependent logic can easily be edited and changed through a drag-and-drop UI as well.
BigPanda’s Event Enrichment Engine creates high-quality incidents that deliver tangible benefits to IT Ops, NOC and DevOps/SRE teams across the incident management lifecycle.
Improved correlation and alert noise suppression
Enriched alerts turbocharge event correlation to facilitate real-time incident detection and noise reduction.
More accurate root cause analysis
Enrichment makes it easier for BigPanda’s Open Box Machine Learning technology to surface infrastructure-related probable root cause and root cause changes alike, with a high degree of accuracy.
Deeper automatic workflows
High-quality incidents’ payload data contains vital context that can be used to trigger precise workflow automations to accelerate incident response, remediation and resolution.
So, what are you waiting for?