At BigPanda, we’ve invested hundreds of engineer-years getting alert correlation for IT operations systems monitoring right. We build models using billions of machine alerts to identify patterns associated with healthy and unhealthy systems. It’s not voodoo and calling it a “big data play” is as meaningless as comparing the intellectual abilities of Einstein and JoJo the chimp because they’re both primates.
What we do is automatically tag unstructured data using attributes associated with IT alerts to create a taxonomy that normalizes all datacenter events across systems, services, locations, time zones, sources, checks, and hosts. It’s unique not because it involves a lot of data but because it defines and operationalizes a universal language used by all IT alerts.
Think about creating a unique fingerprint for every row of machine data and using the equivalent of biometric security to simultaneously try unlocking every door in Beijing. Each fingerprint may only unlock one in a million doors… but when two fingerprints unlock the same one it’s virtually guaranteed that they’re from the same hand.
That’s step one and it’s a prerequisite for accurate correlation at scale. Step two is learning from patterns that emerge as fingerprints unlock common doors. By correlating clusters of “fingerprint matches” across companies, geographies, and types of data, we’re able to build libraries of patterns used to remediate issues to be able to create the world’s most comprehensive repository of solutions to IT issues.
Doing all of that billions of times has made us pretty good at what everyone casually calls “data science” and “machine learning” but we think both are insufficient to explain what we’ve achieved. While we delight in all manner of geekery about cluster analysis, Bayesian logic, and proximity scoring, we’re more comfortable letting customer results validate our approach.
- The managed security services division of a Fortune 10 company recently announced they used BigPanda to resolve a DDoS attack that ordinarily would take a team of 30 people 15 hours to diagnose. They did it in less than an hour with two NOC engineers.
- The digital gaming arm of a Fortune 100 entertainment conglomerate used BigPanda to detect a bug in the payout algorithm used for online poker that would have cost the company a million dollars a minute. No money was lost. BigPanda detected and resolved the anomaly before the code was pushed to production.
To customers like these, what we do isn’t abstract or opaque or cliched. We solve an expensive problem they’ve had forever that keeps getting worse as infrastructure shifts to cloud, architectures shift to micro-services, the frequency of code change shifts from weekly to real-time, and the value of technology shifts from to IT enabler to business essential.
Call us JoJo or Einstein. We don’t care as long as you call before your next outage.