Any post talking about incident mgmt workflow: detect, triage, root cause, escalation, resolution.

Product News: Manage Alert Pandemonium with BigPanda

By |2019-03-28T07:00:31+00:00August 15th, 2016|Blog|

Wondering what the BigPanda product team has been up to lately? In our new regular blog series, we’ll provide you with everything you need to know about new product features, upgrades, integrations, and more! Here are a few of the latest additions you may not have discovered yet:

Three key themes from ServiceNow Knowledge16

By |2018-04-17T18:42:34+00:00May 26th, 2016|Blog|

Decompressing from an exhausting, inspirational few days at Knowledge16, the annual ServiceNow event...

From humble beginnings (my first Knowledge was a few hundred attendees in a tent in San Diego), Knowledge has become a global tour de force. This year, Mandalay Bay could barely contain more than 11,000 customers and partners (and the expo hall could barely contain more than 100 decibels of the tech equivalent of Queensryche). Getting into the keynote felt like rush hour on the subway in midtown Manhattan. 

Not all alert correlation platforms are created equal

By |2019-04-15T13:51:26+00:00May 23rd, 2016|Blog|

Ask yourself these questions to find the right fit in an alert correlation platform.

To maintain operational visibility in modern IT environments, companies are abandoning monolithic monitoring solutions from legacy vendors in favor of a modern set of “best of breed” monitoring tools. Today’s average IT monitoring stack consists of about 6-8 tools, including at least one from each of the following categories: systems monitoring, end user monitoring, application performance monitoring (APM), error detection, log analytics, chat, and ticketing. When service disruptions occur, operations engineers face a flood of alerts across different layers of the IT stack, with no fast way to figure out what’s really going on. Customers are left stranded, while IT professionals struggle to detect, triage and remediate urgent issues. Downtime abounds which negatively impacts revenue, performance, and brand loyalty.

How alert correlation helps Dev and Ops work better together

By |2019-04-17T02:42:21+00:00April 28th, 2016|Blog|

This post was recently published as a guest blog by our friends at Jira Service Desk. You can find the original post here.

We all need to move fast in order to stay competitive. But the faster things move, the faster things break.

While many companies have made great strides towards automating application release and infrastructure management, automation for service assurance has been sorely lacking. That’s left Dev and Ops with a problem: how to effectively service alerts that have grown by orders of magnitude.

Part 1 of 2: The reason why Nagios is so noisy – and what you can do about it

By |2019-04-17T03:02:56+00:00December 1st, 2015|Blog|

If you’re struggling with a flood of Nagios alerts, this two-part blog series is for you. We’ll take a close look at the complicated relationship that IT and Ops professionals have with the monitoring tool, explain why Nagios is so noisy, and discuss the simple way that you take charge of your alerts and maximize the way Nagios works for you.

Key takeaways from DevOpsDays Silicon Valley

By |2018-04-17T18:22:57+00:00November 12th, 2015|Blog|

In between sessions at last weekend’s DevOpsDays Silicon Valley, scores of attendees filled the halls, amplifying the Computer History Museum with chatter and turning it into something more akin to a high school cafeteria than a conference venue. As crowds formed to share their stories and insights with one another, a common theme quickly emerged: It just isn’t as easy as we thought it would be.

How to Use the 80/20 Rule to Turn Noisy Alerts into Actionable Intelligence

By |2019-04-17T06:41:25+00:00October 26th, 2015|Blog|

If you work in tech, you’ve probably heard of the Pareto principle, or, as it’s more commonly called, the 80/20 rule. According to the 80/20 rule, for many events, 80 percent of the results are generated by 20 percent of the inputs.

A little background: back in the late 1800s the Italian economist Vilfredo Pareto noticed that approximately 80 percent of the land in Italy was owned by 20 percent of the population. Not long after, Pareto also observed that 20 percent of the peapods in his garden generated 80 of the crop’s yield – and thus the 80/20 principle was born. 

#Monitoringlove in Portland

By |2019-04-15T11:53:35+00:00June 12th, 2015|Blog|

Last year was an amazing experience, and we couldn’t wait to come back for more. BigPanda will be back at Monitorama to hear talks from leading open source developers, web operations experts, and a variety of thought leaders in the monitoring space.

5 key consideration for overcoming app and computing sprawl

By |2019-04-16T12:48:36+00:00May 7th, 2015|Blog|

In my last post, I discussed how enterprise application sprawl, if left unchecked, puts organizations at risk. In this post, I’m going to discuss what to do about the problem. Today, any single department within even a mid-market enterprise will have more applications deployed than was standard – organization wide – just a dozen or so years ago. These apps include everything from cloud-based CRM to social media tools to AWS workloads to various big data tools to collaboration suites, and on and on and on.

Getting Started with BigPanda – The Incident Feed

By |2019-04-03T10:36:35+00:00May 4th, 2015|Blog|

BigPanda is an incident management platform for IT, NOC, and DevOps teams. Organize, prioritize and triage your incidents faster and more intelligently than ever before. Vastly improve your team's collaboration around Ops alerts and events. The following guide is the first in our series on getting started with BigPanda's incident feed. This BigPanda product introduction will help you to get up and running quickly so you can get back to fixing the world's broken stuff.

How 83% noise suppression saved Vlad a million dollars so far this year

By |2019-04-17T02:53:53+00:00April 21st, 2015|Blog|

I met Vlad in the bar in Vegas after a long day of telco NOC drudgery. He was enjoying his whisky and clearly didn’t want to be interrupted by me asking about his datacenter. I could tell he’d rather I had asked about anything else… Cat Stevens, Greek myths, Faberge eggs. Anything. I interrupted him anyway and asked what’s required to go from the three nines he referenced in his keynote to the five nines his customers demand. He winced in pain. I thought he swallowed an ice cube or his Johnnie Walker was laced with cyanide. Turns out he was deep in thought. He proceeded to share wisdom that inspired me… to drink whisky and grow facial hair.

BigPanda Hits it Big!

By |2018-04-17T18:30:55+00:00April 21st, 2015|Blog|

BigPanda is attending our first ServiceNow Knowledge15 event, April 21-24, at the Mandalay Bay Convention Center in Las Vegas. We’re hoping, though, that the relationships we build in Vegas, don’t stay in Vegas…

Get everyone on the same page… literally

By |2019-04-11T10:03:16+00:00February 2nd, 2015|Blog|

Whether we practice more traditional operations processes with a 24x7 NOC and well-documented processes, or we’re embracing DevOps-styles with cross-functional teams and highly iterative methodologies, one problem we all face is the growing disconnect between our monitoring systems, the alerts they fire off, and the processes we’re using to handle operational issues. We log incidents in a ticket, but are the folks working on that ticket aware of the real-time status of the underlying incident? 

Announcing BigPanda Incident Sharing

By |2019-04-11T03:51:45+00:00January 8th, 2015|Blog|

We're excited to announce the release of a major new feature in BigPanda called Sharing! As you know BigPanda intelligently clusters your noisy alerts into high-level incidents. With our new Sharing feature, it's now easy to notify and collaborate with anyone on your team about critical incidents.

Automating Incident Management

By |2019-04-15T06:55:21+00:00October 28th, 2014|Blog|

Data center growth over the last 15 years has created significant growing pains in terms of data center management.  Tasks that once could be done manually by IT teams have hit the limits of scalability, cost, and efficiency.  The key to enabling IT to meet these challenges involves one key theme: automation.

Getting Started with BigPanda – Incident Triage

By |2019-04-15T12:35:43+00:00October 17th, 2014|Blog|

BigPanda is an incident management platform for modern IT, Ops, and DevOps teams. With BigPanda, you will prioritize and route your incidents better and faster, while vastly improving your team's collaboration and processes. This is part 2 in a series on Getting Started with BigPanda. This guide will help you get up and running quickly and maximize the value you get out of the platform.

Getting Started with BigPanda – Incident Analysis

By |2018-04-17T18:52:34+00:00October 15th, 2014|Blog|

BigPanda is an incident management platform for modern IT, NOC and DevOps teams. With BigPanda, you will prioritize and route your incidents better and faster, while vastly improving your team’s collaboration and processes. This is part 3 in a series on Getting Started with BigPanda. This product introduction will help you to get up and running quickly so you can get back to hunting fail-whales and 404 errors.

Getting Started with BigPanda – Assign Incidents

By |2019-04-15T12:28:53+00:00October 13th, 2014|Blog|

BigPanda is an incident management platform for modern Ops environments. With BigPanda, you will prioritize and assign your incidents better and faster, while vastly improving your team’s collaboration and processes. This is part 4 in a series on Getting Started with BigPanda. This guide will help you get up and running quickly and maximize the value you get out of the platform.

Golden Age of Developers = Nightmare for Ops

By |2020-04-08T20:17:35+00:00September 18th, 2014|Blog|

The last ten years have brought enormous changes to production environments, driven by a best-of-breed approach to production infrastructure enabled by open source and cloud.  This has been a boon for developers in terms of flexibility and productivity,  but it’s also placed a new set of challenges and expectations on Ops.

The new Alerts REST API from BigPanda

By |2019-04-15T14:40:07+00:00September 4th, 2014|Blog|

CONNECT ALL THE THINGS! Here at BigPanda we are constantly working on adding new monitoring systems to our arsenal of out-of-the-box integrations. We already provide integration with all of the most popular monitoring systems & services. Nagios, Zabbix, Zenoss, New Relic, AppDyamics, CloudWatch, Pingdom are all there. And there's many more – this list gets longer with every week that passes. These out-of-the-box integrations from BigPanda have many advantages:

New Relic and BigPanda = #Monitoringlove

By |2019-04-15T13:42:26+00:00July 8th, 2014|Blog|

Monitoring applications in production has never been easier. With only a few code lines, you'll have New Relic installed and monitoring your application from nearly every angle. When something goes wrong, New Relic will start sending alerts. But then what? (hint – New Relic and BigPanda together is the answer).

Stop Managing Ops Incidents with Jira or Zendesk

By |2020-04-08T20:19:11+00:00May 2nd, 2014|Blog|

In many ways, incident management for devops is similar to typical issue tracking processes: it facilitates coordination and collaboration of daily tasks. For this reason, tools such as Jira, Zendesk, and even email are often used as solutions for incident management. But incident management faces one unique challenge that makes it different from other issue tracking processes. In addition to human-operated workflows, incident management also relies heavily on machine-driven workflows. Unfortunately, traditional issue trackers and ticketing systems cannot accommodate for this with their current product mechanics.

4 Ways to Combat Non-Actionable Alerts

By |2020-04-08T20:14:16+00:00April 23rd, 2014|Blog|

Many alerts place an unnecessary burden on Ops teams instead of helping them to solve issues. The main problem is that most alerts are not actionable enough:

  • They point to issues that don’t require a response
  • They lack critical information, forcing you to spend time searching for more insights in order to gauge their urgency

Building a Fast Ops Incident Dashboard

By |2020-04-08T20:16:26+00:00April 14th, 2014|Blog|

Few things damage productivity as much as waiting. Waiting forces us to context switch, disrupts our creative momentum and eliminates our ability to experiment. Whether we are deploying a new service or troubleshooting a problem, waiting puts a heavy tax on efficient work.