IHG receives actionable alerts in one place to help drive their highest service availability ever
InterContinental Hotels Group (IHG) is a global hospitality company with 18 different brands and over 6,000 hotels, representing approximately 288,000 rooms.
- Rapid growth increased operational complexities across 6,000+ hotels.
- Third-party owners relied on siloed monitoring solutions, causing low availability and poor MTTR.
- Expensive service-impacting outages negatively affected business availability.
IHG franchises, leases, manages, and owns 6,000+ hotels in over 100 countries. Over the past several years, IHG experienced strong organizational growth with new hotel build-outs, acquisitions, and technology investments that also helped them deliver best-in-class guest experiences. With more than 1,800 hotels in its development pipeline, ensuring the performance and availability of online applications and services is vital to IHG’s goal to continually exceed customer expectations when interacting with the brand.
The Unified Command Center (UCC) is IHG’s centralized monitoring and alerting organization tasked with the performance and availability for key applications and services for IHG hotels and corporate offices. When IHG decided to move to a hybrid-cloud architecture, it delivered a new proliferation of event and alert data. The adoption of DevOps methodologies also added an additional layer of IT complexity with a massive acceleration of siloed application and infrastructure event data.
The UCC quickly experienced an unmanageable influx of events from all of their monitoring and observability tools that prevented them from having visibility into their IT operations. It became very difficult to identify important, actionable alerts from the event noise, causing IT incidents to pile up and outages to occur.
“Having these disparate systems increased our mean time to detect [MTTD],” explained Alvin Smith, vice president of global infrastructure and operations at IHG. “We have two measurements that are key for us: mean time to detect and mean time to resolve [MTTR]. Although MTTR is important, if we’re unable to identify that an event is taking place, then we’re at a tremendous disadvantage before we start troubleshooting an incident. We were faced with extended outages and significant impacts to our availability.”
IHG needed to strategically scale their UCC teams’ productivity by reducing the internal pain from manually detecting and managing operational incidents.
Utilizing BigPanda, IHG was able to gain holistic awareness of their service operations by consolidating the events from siloed monitoring and observability tools into one centralized single pane of glass. This removed the need to manually switch between separate tool consoles when responding to incidents.
BigPanda also significantly reduced event noise by automatically filtering out false positives and benign events—and adding context to the remaining events to transform them into actionable alerts. This enabled the UCC to proactively focus quickly on the important alerts related to incidents before they became outages and customers were impacted.
“Centralizing our operations with AIOps and BigPanda allowed us to have a much earlier MTTD, which gave us a head start to resolve operational incidents,” says Smith. “It all comes down to the availability of your services. And by leveraging AIOps, we can ensure the highest level of availability by making sure that we are aware of issues in our environment and are able to resolve them quickly. I’m proud that we achieved 99.8% availability in 2022—our best performance on record.”
IHG succeeded in not only significantly increasing availability of key applications and services but also reducing their IT complexity to keep costs down and team morale high.
Focusing on defined, actionable alerts instead of event data reduces both the number of tickets created and charges from their incident managed service provider (MSP). It also provides better visibility and budget planning when working with their MSP.
Furthermore, the UCC can take on a higher volume of work leveraging their existing resources and capacity because of their ability to optimize internal processes and application architectures.
“Our culture has changed,” says Smith. “Now, instead of us going out and trying to discover where there are shadow IT groups or isolated dashboards, our internal customers are coming to us. They’re asking us, ‘How do we integrate BigPanda?’”
“This collaboration takes advantage of the fact that we have identified standards around our IT operations using events and alerts. As new teams and tools come within our UCC, we’re able to have conversations with product owners to help them understand how beneficial it is for them to configure and develop their products and services into our standardized configurations leveraging BigPanda. It keeps our costs down from both a resource and operations management perspective.”