BigPanda blog

RESOLVE ‘22: Measuring what matters

RESOLVE ‘22: Measuring what matters

Companies can take big strides toward “preventing preventable” incidents by minding what they measure.

What’s in a name? In Measuring what matters, one of the panels at our RESOLVE ‘22 event, the three words in the title reflect a plan successful IT Ops teams have embraced to reduce the complexity of their reporting systems—resulting in a faster path for companies to make more effective use of all the IT resources at their disposal.

The collection of speakers at the event noted that measuring what matters can create significant positive impact on the business, its processes and the job quality/enjoyment of the IT Ops personnel on the backend.

“Operational support teams are just getting bombarded with a number of alerts, notifications, et cetera,” PlayStation Director of Operations/Service Management Ben Narramore said in his opening remarks at the event. And for a lot of companies out there, the larger goal is—or should be—to study, consolidate and generally improve upon those alerts and notifications until the white noise and wasted effort they create are all but gone.

Saying goodbye to the “spray-and-pray” model

Event moderator and Blackrock 3 General Partner Rob Schnepp put it another way. “I’m interested in opinions on casting a wider net versus targeting a response with more specific information,” he said in an early question to the panelists.

Matt Maxie—a senior IT manager of automation and tools at Lumen Technologies—said in his reply those “broad nets” create undesirable outcomes all companies should work to minimize:

  • Human time wasted, with technical support teams “caught up in 30-40 minute meetings where they know it’s not their issue” but are required to maintain presence all the same.
  • System impact diminished, with teams dismissing valid, teachable/fixable incidents as white noise because they appear similar to junk notifications.
  • Mean time to resolution (MTTR) increased, resulting from the added time / human intervention needed to approach problems before they’re even fully identified and solved.

And that’s before considering the human frustrations that come with the territory. When an employee knows they’re wasting time and that the issue creating the time sink is avoidable, it’s easy to see how tensions can rise.

One example our presenters brought up a few times throughout the talk really resonated: being called to a fruitless task is no fun no matter what role you’re fulfilling—and getting summoned from home to do that can be downright torturous.

Why companies should put greater focus on proactive incident management

For Maxie and his team at Lumen Technologies, greater emphasis on problem management—including reducing the signal-to-noise ratio in reporting—ultimately means reductions in issues like the above (very much including the human frustration part).

“We aim to prevent preventable incidents from happening again,” he said. “The more data we collect during an incident and provide our teams, the more we’re able to proactively resolve or prevent many of those incidents from ever happening again.”

Of course, not all business stakeholders view IT situations from an IT perspective. This statement may definitely include stakeholders with control over IT budgets. As Schnepp noted, this common challenge requires technically-focused personnel to approach their arguments and qualifications from the data-driven perspective of larger business benefit.

“It makes it so easy for me as a leader to ask for more headcount or tools when I can say, ‘Look, this outage cost us this much money,’” Mr. Narramore said. “It’s not something they can really argue with you when you can show numbers and facts with downtime and outages and the overall impact.”

In other words, every skilled communicator in the IT Ops bullpen knows the value of what’s in it for me (WIIFM)-style presenting. But a proactive attitude toward incident management and tools to support it allow one to approach the conversation with the black-and-white data that changes minds in the boardroom. Combined with the more practical day-to-day benefits of a viable AIOps platform, that makes the correct outlook towards proactive incident management a transformative presence in any IT Ops environment.

Measuring the positive human impact of better analytics

Companies can’t lose sight of improvements that make working life better for their people. That’s especially true in today’s IT hiring environment where a growing talent shortage makes finding various important skill sets increasingly challenging and companies go to great lengths to retain the people they have.

Beyond removing the need to sit on incident response or SWAT calls where they know it’s not their issue, as referenced above, better measurement and analytics empowers companies to empower their people. The notion came up repeatedly throughout the panel in different ways, with many points converging on the same four high-level benefits: 

  • Less time spent chasing false positives. As Mr. Narramore said, the financial and human costs of repeated false positives (which could be remediated with the right tools and outlook) are substantial and can grind morale down even in otherwise great workplaces.
  • More time spent resolving problems that actually need human intervention. Unlike the “urgent and unnecessary” drain of false positives, better analytics mean people spend their time approaching problems that actually rise to their skill sets and job descriptions.
  • Fewer on-call shifts where support personnel are guaranteed to come in. Narramore noted that morale within his teams improved after evolving the analytics simply because his people no longer worried about things like “I’m on call; I can’t go to dinner tonight.” A smarter outlook meant less time wasted chasing problems that might not exist.
  • Better decision-making on the fly and in the field. Both panelists were also quick to note that a viable analytics outlook (and supporting AIOps platform) doesn’t exclude human insight. Instead, it gives human actors at all levels greater ability to approach situations that arise on the job, both in terms of known challenges and the unknown quantities that arise from time to time in every company.

And this logically culminates in workplaces where people engage with their work more and enjoy what they do more to boot. Everyone works better in an environment where the employer takes active steps to reduce working frustrations. Moreover, it’s always easier to look forward to coming to work when workers know they’re solving real problems and having a real impact—not spending the workday investigating problems that end up being nothing.

Closing thoughts from our Measuring what matters panelists

In the panel’s closing comments, Schnepp made an interesting correlation between support teams and firefighters. “When you can help teams and individuals look down the road and say, ‘Yes, when we get pinged the house is on fire to a pretty high degree,’” he said, companies have a much easier time getting people to the appropriate tasks.

Maxie advised that getting there is not an overnight process, and companies wishing to evolve their IT measuring and analytics should make sure to tackle the challenge with an appropriate cadence.

“Start small,” he said. “Pick the simple incidents, the repeatable incidents, where your teams see it and take steps A and B to resolve it. These are the kinds of scenarios that you want to automate first.”

It’s solid advice tying to a core idea: with the increasing complexity of modern IT systems—and the significant morale impact false positives and the like can hold—finding a better way to measure and respond to incidents is essential. Between positive business impact and quality-of-job improvements, there are few reasons not to harness the power of an AIOps platform now—before the organization’s technical complexity and IT Operations needs redoubling again.

Learn more from our Measuring what matters panelists

Our webinars, panels and keynotes from RESOLVE ‘22 and beyond offer viewers a compelling mix of best practices and actionable intelligence. Watch the full presentation here and grow your business’s automation outlook—and more—with our guidance.