How Autodesk reduced incidents by 69% and improved MTTR by 85%

About

Autodesk, Inc. is a leader in 3D design, engineering and entertainment software. Autodesk makes software products and services for the architecture, engineering, construction, manufacturing, media, education, and entertainment industries.

Challenge

Prior to adopting BigPanda, Autodesk’s IT Ops team faced an average of 100,000 monthly application alerts across 25 monitoring tools, including Amazon CloudWatch, Splunk, Catchpoint, Dynatrace, Logic Monitor, CloudGenix, NewRelic, ELK, ServiceNow and Slack. The bandwidth of the small and globally distributed NOC to filter and identify actionable insights from thousands of noisy inbound alerts was quickly depleted. The NOC had to switch between several monitoring dashboards to investigate every unique incident, which further exacerbated the manual steps and time needed to identify the root cause of an incident. Escalations to DevOps and Site Reliability Engineering teams caused delays and reduced the time these teams could spend on new features and innovation.

Finally, Autodesk’s ServiceNow tickets lacked critical context such as host name, business priority and responsible escalation team that helps responders act fast. Instead, the NOC would have to manually input information into a ticket and would assign it to multiple response teams without knowing which team was responsible. This increased the mean time to resolve (MTTR) due to the complexity of analyzing all the alerts that comprise the incident and the error-prone nature of manually inputting the information into the ITSM platform.

Solution

To address these challenges, Autodesk turned to BigPanda’s Intelligence and Automation platform. To improve and accelerate their ITOps, Autodesk leveraged several key BigPanda capabilities, including:

  1. Contextual data using custom tags
  2. Environments and Groups
  3. Smart Ticketing
  4. Analytics

Contextual Data with Custom Tags

BigPanda’s enrichment feature adds contextual information to alerts to make them actionable. Autodesk can utilize its host naming convention to extract and create enrichment tags that provide information about the alert device, domain, device function, location and more. Then, they created custom incident tags that include business context and process like priority response, which team is responsible, runbook, etc. so support teams can act quickly on an incident without escalation.

Environment and Groups

BigPanda helps Autodesk focus on the most relevant incidents by organizing incidents into different environments based on the areas of responsibility and processes within IT and DevOps teams. Furthermore, environment groups add another level of hierarchy consisting of one or more environments, helping Autodesk’s IT team organize incidents into common functions like business services, teams and infrastructure areas.

Smart Ticketing with Auto Sharing

Autodesk integrated the BigPanda platform with ServiceNow and Slack to establish Smart Ticketing. With the Auto Sharing feature, BigPanda automatically notifies key team members and provides tickets that include incident information, detailed links and real-time updates showing the latest status of BigPanda incidents. BigPanda’s integration with ServiceNow helped Autodesk streamline their workflow across the incident lifecycle, from detection to investigation to remediation.

Analytics and Snapshot

BigPanda’s analytics help Autodesk visualize various trends in their monitoring data. The BigPanda dashboard provides a dynamic view of their data in real time, making it ideal for operational health monitoring and situational awareness. Analytics Reports provide on-demand snapshots of their data for specific periods of time, which helps the IT team visualize historical trends in their monitoring data and identify infrastructure problem areas.

Benefits

By adopting BigPanda, Autodesk equipped their team with greatly enhanced insights into the issues underlying all the events, improving their ticketing processes and driving greater operational efficiency. Their IT team was able to logically consolidate noisy alerts into actionable incidents, driving a significant 69% reduction in incidents while improving mean time to resolution (MTTR) by 85%. This directly contributed to improvements in operational efficiency while helping the team detect data center-wide anomalies with logical and time-based correlation patterns.

Figure 1. Example of Autodesk’s incident management workflow using BigPanda’s incident management and automation platform to reduce incident volume and expedite mean time to resolution.

Figure 1. Example of Autodesk’s incident management workflow using BigPanda’s incident management and automation platform to reduce incident volume and expedite mean time to resolution.

Curious to learn more about how your company can achieve similar results with BigPanda? We can provide a customized demo focused on your business and your exact use case, or you can check out BigPanda’s Starter Pack and get started in just two weeks through accelerated onboarding.