These were the sentiments not of newbies, but of industry veterans. Sharp, experienced professionals who all agreed that – even in the age of mature tools like Kubernetes or Spark – nothing is ever as simple as the docs imply. There’s a learning curve to success, and it takes cycles of testing, failure, learning, and re-implementation before you finally get it right. To implement modern DevOps practices at scale, it seems that churning through a host of different tools before establishing a stable backend is an all-too-common tale. We’re all chasing a shared vision of infrastructure as code… and we’re all encountering the same challenges.
Not everyone is Kelsey Hightower
This was echoed in (the always-entertaining) Greg Poirier’s talk, “Not everyone is Kelsey Hightower”. Poirier, the Factotum, Chief Architect, and self-described “herder of engineering” at Opsee, shared the ups and downs of Opsee’s operation to develop an infrastructure that would allow the company to “continuously deliver applications, be appropriate for distributed systems and a services-oriented architecture, and would facilitate rapidly prototyping, building, and scaling new features.” His moral of the story was this: you can’t go from “zero to Kelsey Hightower overnight”. It took Poirier and his team time, patience, and more than a few hiccups to understand their exact needs and determine the tools to best fit their requirements.
Every little bit of noise matters
Poirier’s talk, and the general sentiments of the attendees, underscored an important truth in sysops: in environments where there is a high frequency of change and every second of downtime has a significant – and often painful – impact on the business, every little bit of noise matters. But in modern-day ops environments – which often require a whole orchestra of tools, all pinging around the clock, to monitor the health of various parts of the IT stack – the chaos can quickly reach pandamonium.
Without a centralized tool to help consolidate, correlate, and make sense of alerts, they truly become nothing but “noise”: an annoying, messy nuisance that makes root cause analysis difficult and complicates investigation and remediation. Alert correlation steps in to help DevOps pros find meaning in the mess: parsing out each voice so that it can be clearly heard, removing alerts that say the same thing, and injecting context to identify which cry most loudly signals the fire.
Fighting alert fires with ChatOps
Another hot topic at the conference was the role of ChatOps in managing and optimizing the collaboration between humans and machines. The entire ChatOps movement is contingent on the ability to parse and distribute only actionable incidents and their related updates into relevant collaboration channels. The very success of ChatOps hinges on the ability to effectively correlate alerts into manageable and actionable insights. If every raw, noisy alert is distributed into a chat channel, all you’ve done is migrate the problem from one platform to another! In other words, while an effective ChatOps strategy undoubtedly helps fight the fire, you first need to ensure that you adjust the nozzle on the hose.
Missed us at DevOpsDays? Team Panda is hitting the road again this week at FutureStack15 in San Francisco! If you’re attending the conference, drop by our booth for a free tee and learn how you can get a drone.