In part 1 of this series we defined algorithmic alert correlation and how it works.
The term “algorithmic” describes how data science applies machine learning techniques to solve alert storms, aka alert floods. There are two flavors of machine learning currently being applied to this problem: one is “black box” and the other, “open box”. BigPanda applies open box machine learning, where full control is given to the user’s discretion. Open box ML is transparent and deterministic, returning consistent, reliable results. On the flipside, users can edit and create BigPanda’s alert correlation patterns on their own, as well.
How to Edit Alert Correlation Patterns in BigPanda
This is where BigPanda shines as a far more effective and efficient platform. Our solution is the only one in the market that allows users to configure the alert correlation algorithm for their specific needs. In this way we empower IT Service Operations personnel – whether called NOC operators, systems admins, DevOps engineers or SREs. Alerts are intelligently correlated into logical groupings. Raw data is parsed and groupings are automatically clustered based on the alert correlation patterns.
For example, when a set of alerts arrive that share a common pattern, they are categorized into higher-level incidents and flagged for more immediate attention. This eliminates the manual effort of connecting the dots, which can be difficult and time consuming when a stack is comprised of numerous, diverse monitoring tools.
BigPanda’s Algorithmic Alert Correlation Engine provides shelter from alert storms. Let’s dive a bit deeper and check out how it works.
While creating or modifying alert correlation patterns, BigPanda’s Real-Time Preview appears on the right pane to show you real-time results of your actions. This way IT systems administrators have full control and can avoid trial-by-error. Any sensitive changes need not be deployed to the Production environment.
BigPanda can process many different alert properties – including organic alert attributes, infrastructure naming conventions, configuration data and custom tags.
The first step is to isolate alert correlation patterns, because multiple events of a similar nature can potentially point to greater underlying problems. Correlation patterns can be enhanced with use of enrichment and custom tags, such as naming conventions already in use. Depending on how an organization structures its host or server naming, an extraction rule can be generated to aggregate alerting on any given cluster.
Let’s take a corporation that follows this convention: location.cluster.domain.url
So our example is: sjc.billing.cisco.acme.com
Using the Edit Custom Tag dialog shown below, a special tag can be used to reference the cluster name and subsequently correlate alerts that occur under that same cluster. BigPanda supports Regular Expression when defining the tag rule so other source identifiers can be used for the variable. This allows a high level of customized sculpting based on requirements.
The rule can be further manipulated to create an incident, based on a specified time window.
According to the settings in the Edit Correlation Pattern window, if multiple alerts come in within a time frame of 15 minutes and under the “corp” cluster, they would be correlated into an incident:
The Timeline is another compelling feature of BigPanda. Incidents can be displayed in a chronological format which helps to visually order alerts relative to each other. The Timeline feature aids IT administrators who wish to monitor only applications and infrastructure important to their particular role or purview.
Further Enriching Alert Correlation Patterns
Aside its versatile editing capabilities, our algorithmic alert correlation engine benefits from integration with several other features of the BigPanda platform. For example, Smart Ticketing improves IT team collaboration by automatically creating a single ticket against a unified incident/alert cluster. Sharing rules can be set to automate team communication on root cause and speed incident resolution.
BigPanda’s alert correlation algorithm delivers real-time, actionable, performance-driven results. Our platform the only solution currently in the market featuring this pattern definition preview compatibility. Moreover, functionality is constantly evolving to be more predictive, and using machine learning the pattern recognition & analysis will be continuously optimized in the future.
To keep learning about alert correlation, check out the following: