Critical and sev1 incidents are always a priority, but what about those dozens and often hundreds of lower priority ones that often sit in a queue waiting for a first response engineer to get to them? Do you find that no matter how much effort your team puts into minimizing the number of queued incidents, their number always seems to grow? If this sounds familiar – this blog is for you.
Manually Routing Incidents are Painful in More Ways than One
Low priority incidents sitting in a queue and waiting for manual routing harbor greater risks than may first meet the eye, mainly escalating to outages that could have been avoided in the first place. Some incidents in a warning state could be an early indication of the onset of an outage, so the sooner an engineer can take action on it, the higher the chances of that incident being resolved before a possible escalation.
Another byproduct of queued incidents, are un-targeted bridge calls. Have you ever sat on a bridge call and asked yourself “Why am I here?”, while listening to the incident commander as he spends valuable minutes explaining what he needs from you and other participants? Those valuable minutes could have been spent on mitigating the situation, had you gotten the incident prior to the call.
And of course these bottlenecks don’t help your SLAs, mainly your Mean Time to Acknowledge (MTTA) – but not only, as you strive to provide better service and (let’s face it) achieve your personal KPIs.
Automated Incident Routing to the Rescue
Automating incident routing can obviously help release incident queue bottlenecks. Here are some guidelines, based on our experience, that help:
- Use incident enrichment mechanisms to add information about the team likely to be called for each type of alert based on location, infrastructure, etc.
- Then, filter the incidents each team sees based on this information, and create in effect team “in-boxes”. This will substantially lower your MTTA.
- If your incident management system can provide an assessment of an incident’s probable root cause, use this information to streamline the bridge call team assembly. You can substantially lower your Mean Time To Assemble this way.
- And finally – implement all the above into your system’s automation.
Incident Routing with BigPanda
BigPanda allows you to easily automate incident routing:
- The Open Integration Hub allows incident enrichment, providing the ability to tie in routing information to alerts
- The Operations Console allows setting up custom environments that reflect the different inboxes each team is responsible for. Applying custom rules on an environment will automate incident routing to a team’s inbox.
- Dynamic Incident Titles in the Operations Console provide probable root cause to an incident, allowing quick assembly of the right bridge call team when needed.
- Incident sharing automation allows automated notification in any collaboration tool.
This short video shows you automated incident routing works in BigPanda: