How Workday is using BigPanda to increase availability SLA
It’s always great to get a shout-out from a customer, so I was thrilled to read a post on “Medium” written by our friends at Workday that talks about BigPanda!
In the post, Owen Sullivan, Software Development Manager at Workday, discusses how Workday raised their availability SLA from an already industry-leading 99.5% to 99.7%, and how one of the key drivers they see for delivering on this promise is efficient monitoring. As part of that, they recently added Prometheus to their monitoring infrastructure, gaining improved visibility into their service health with new metrics and alerts.
But as is always the case, while adding much-needed visibility and value to individuals and teams, these new metrics and alerts also added, in Owen’s words, “a firehose of data noise.” Owen continues to explain:
“ [these alerts] tended to stay around forever, even if they were added for a transient issue. And at the same time, Workday keeps growing and advancing our product offering… The combined effect made it difficult to know which consoles to look at in emergent situations”.
So the first step they took to address this issue was… wait for it… deploying BigPanda!
“BigPanda uses machine learning to correlate alerts into insight-rich incidents and present them in a single pane of glass. This makes it easier to respond to, and resolve, problems in our infrastructure”.
Couldn’t have said it better myself.
I could go on – but I think it would be better to hear it straight from the source . It’s much more convincing…
And – you can take a peek at Owen’s new puppies!
Thanks Owen, for the mention. At BigPanda, we’re thrilled and honored to partner with you in your success.