WIX (NASDAQ: WIX) is a cloud based web development platform that makes it simple for anyone to create beautiful, professional looking websites. With nearly 60 million users worldwide, any service outage of Wix’s platform leads to millions of frustrated and upset customers.
Like most modern Operations teams, Wix has a complex monitoring stack including Nagios, NewRelic, Pingdom, Logstash, Kibana, Sensu, BI (a proprietary metrics solution), BAM (a proprietary business monitoring system), and Chef for their continuous deployment. Wix’s goal, from a monitoring perspective, is to achieve broad visibility into their production infrastructure and to ensure redundancy of monitoring coverage for business-critical systems. But with so many moving parts to monitor (alerts, warnings, changes, and metrics) it became difficult to quickly spot critical IT incidents, which were buried among other monitoring noise.
To ensure that IT could quickly identify critical issues and collaborate with team members to resolve them, Wix chose BigPanda as the cornerstone of their monitoring operations.
According to Mark Sonis, Operations Architect at Wix: ”Troubleshooting issues that once took me up to an hour to deal with, now take 80% less time with BigPanda. It’s much easier to spot critical issues and to stay up-to-date as those issues evolve.”
Prior to implementation of BigPanda, Wix had two main challenges in Infrastructure Operations:
- Major outages were often first discovered by customers and then brought to Wix’s attention via support tickets. Wix’s primary goal was to accelerate time-to-detection of such outages, so they can spot and resolve incidents, before their customers are affected.
- Minor outages and service problems would often get lost amongst a sea of alerts and remain undetected. With 100’s IT alerts each day, it was often impossible to investigate all of them.
With BigPanda, the Wix team now had all of their IT alerts in one platform. Using BigPanda, they were able to:
- Automatically cluster noisy low-level alerts into a much smaller list of high-level incidents.
- Correlate IT alerts against recent code deployments and other changes
- Easily discuss and assign IT alerts to anyone in the company
The ability to quickly spot, investigate and collaborate on IT issues has helped Wix dramatically reduce their MTTR and react to critical issues before their customers do.
Today, BigPanda is used as one of the primary dashboards for the entire company to see. BigPanda notifies the entire team instantly both visually and audibly every time a new critical incident occurs. This has reduced Wix’s critical incident acknowledgement to literally just a few seconds. New incidents are detected and assigned in BigPanda within seconds to the responsible team members. This has allowed Wix to dramatically shorten the mean-time-to-recovery (MTTR) for their critical issues.
Sonis, who is chiefly responsible for Wix Operations stability and uptime said, “BigPanda provides tremendous value for us. It allows us to see critical issues far faster than we ever could before. No other tool unifies our entire monitoring stack into a single easy-to-use dashboard like BigPanda does. No other tool helps us to see all of our deployments and changes in the same place as our incidents. And no other tool allows us to notify our entire team to critical system outages in mere seconds – far in advance of our customers notifying support.”