Automating Incident Management

IT OPERATIONS & DEVOPS WANT TO HELP GROW THE BOTTOM LINE IT Ops and DevOps teams in every organization are capable of focusing on revenue-generating initiatives and projects. Unfortunately they’re held back by constant fire-fighting…which means they are reduced to supporting just the current state and existing/legacy applications and services. I saw this first-hand, working for […]

This is the first in a series of blog posts on Open Box Machine Learning. If you’re part of a large enterprise, you’re probably in the throes of digital transformation. If you’re in IT, you’re supporting your business by rolling out new services and apps weekly (or even daily). Meanwhile, your users expect 24×7 availability […]

Orlando, here we come! BigPanda is excited to be a Platinum Sponsor of the upcoming Gartner IT Operations and Strategy Summit, kicking off in sunny Florida on Tuesday, May 15th. We’re even more thrilled that our CEO Assaf Resnick has been chosen to be among the prestigious list of Gartner IOSS Speakers. Gartner’s theme this […]

Any organization can be defined by its operating principles. These are the fundamental norms, rules and values that represent what is desirable and positive for the group. Having well defined principles can help an organization operate as a “community” with a shared understanding of what is right and what is wrong. It’s key that these […]

For the past year or so, productivity experts have been talking about “Inbox Zero” – a rigorous fresh approach to email management. Reaching Inbox Zero stresses techniques to tame your inbox and keep it empty (or nearly empty) at all times. The practice promises to focus your time and attention on the most important tasks. At an […]

One of the best parts of working in the Technology industry is attending conferences and feeling the collective excitement of the attendees. The crowd shares a united sense of hope and confidence that the breakthrough technology for whatever problems we’ve been struggling with is just a keynote away. Sure, today you can watch live streams […]

In part 1 of this series we defined algorithmic alert correlation and how it works. The term “algorithmic” describes how data science applies machine learning techniques to solve alert storms, aka alert floods. There are two flavors of machine learning currently being applied to this problem: one is “black box” and the other, “open box”. BigPanda […]

The IT Operations tool stack is becoming exponentially more complex. This requires the utilization of a breadth of diverse monitoring tools in order to quickly detect and ultimately resolve critical issues before they can inflict real damage on the business. Most large enterprises already have a host of preferred monitoring tools installed and working. It […]

Digital Enterprise Journal (DEJ) has published a state of the market study, Modernizing IT Operations for Digital Economy. The study is based on insights from more than 1,000 organizations worldwide and includes: Key market pressures & management challenges driving the need for modern IT Operations The role of modern IT Operations in digital transformation Operational […]

In his research note “Four Steps to Turbocharge Your Major Incident-Handling Capabilities”, Gartner analyst Kenneth Gonzalez makes a compelling argument for why enterprise IT service operations teams should upgrade their incident management workflow processes. Here’s BigPanda’s perspective on the topic. The Real Challenge: Most NOCs Aren’t Automated Most enterprises are undergoing some form of digital […]

In today’s complex and fast-paced IT environments, Ops and DevOps teams often rely on a wide range of monitoring tools to successfully detect and investigate critical issues. While each solution may be well-suited to a particular use case or team, it can be hard to gain the overall visibility you need to quickly resolve issues. […]

Wondering what the BigPanda product team has been up to lately? In our new regular blog series, we’ll provide you with everything you need to know about new product features, upgrades, integrations, and more! Here are a few of the latest additions you may not have discovered yet:

Decompressing from an exhausting, inspirational few days at Knowledge16, the annual ServiceNow event…

From humble beginnings (my first Knowledge was a few hundred attendees in a tent in San Diego), Knowledge has become a global tour de force. This year, Mandalay Bay could barely contain more than 11,000 customers and partners (and the expo hall could barely contain more than 100 decibels of the tech equivalent of Queensryche). Getting into the keynote felt like rush hour on the subway in midtown Manhattan. 

Ask yourself these questions to find the right fit in an alert correlation platform.

To maintain operational visibility in modern IT environments, companies are abandoning monolithic monitoring solutions from legacy vendors in favor of a modern set of “best of breed” monitoring tools. Today’s average IT monitoring stack consists of about 6-8 tools, including at least one from each of the following categories: systems monitoring, end user monitoring, application performance monitoring (APM), error detection, log analytics, chat, and ticketing. When service disruptions occur, operations engineers face a flood of alerts across different layers of the IT stack, with no fast way to figure out what’s really going on. Customers are left stranded, while IT professionals struggle to detect, triage and remediate urgent issues. Downtime abounds which negatively impacts revenue, performance, and brand loyalty.

Gartner Names BigPanda a 2016 ‘Cool Vendor’ in Availability and Performance PALO ALTO, Calif. – May, 5 2016 – Gartner has selected BigPanda, the Alert Correlation Platform that turns the huge volumes of IT data and alerts into consolidated insights for businesses, as a “Cool Vendor.” Gartner, the world’s premier information technology research and advisory […]

This post was recently published as a guest blog by our friends at Jira Service Desk. You can find the original post here.

We all need to move fast in order to stay competitive. But the faster things move, the faster things break.

While many companies have made great strides towards automating application release and infrastructure management, automation for service assurance has been sorely lacking. That’s left Dev and Ops with a problem: how to effectively service alerts that have grown by orders of magnitude.

In between sessions at last weekend’s DevOpsDays Silicon Valley, scores of attendees filled the halls, amplifying the Computer History Museum with chatter and turning it into something more akin to a high school cafeteria than a conference venue. As crowds formed to share their stories and insights with one another, a common theme quickly emerged: It just isn’t as easy as we thought it would be.

If you work in tech, you’ve probably heard of the Pareto principle, or, as it’s more commonly called, the 80/20 rule. According to the 80/20 rule, for many events, 80 percent of the results are generated by 20 percent of the inputs.

A little background: back in the late 1800s the Italian economist Vilfredo Pareto noticed that approximately 80 percent of the land in Italy was owned by 20 percent of the population. Not long after, Pareto also observed that 20 percent of the peapods in his garden generated 80 of the crop’s yield – and thus the 80/20 principle was born. 

Last year was an amazing experience, and we couldn’t wait to come back for more. BigPanda will be back at Monitorama to hear talks from leading open source developers, web operations experts, and a variety of thought leaders in the monitoring space.

Avoid incident storms and deliver higher-quality business services 1. Find tools that will help you reduce the number of events that generate tickets in Remedy. If every single IT event is creating a ticket in Remedy, you have a problem. Flooding your incident queue buries agents in irrelevant alerts and causes them to ignore the […]

Network Operations Center Best Practices: NOC Monitoring Tools Adopt the monitoring strategy used by the world’s most respected IT teams by using BigPanda to reduce noise and only generate incidents for critical issues Every second of MTTR (Mean Time to Resolution) reduction improves your ability to compete which means finding critical issues instantly is extremely […]

ITSM Strategy Requires Monitoring Savvy Make managing modern apps scale. Managing IT operations is a thankless job. Nobody notices when services are up… and everyone wants your head on a platter the second they’re not. That hasn’t changed and yet everything else about delivering IT services has. Apps are more critical than ever, cloud infrastructure is more […]

If you work in IT Ops, you’ve probably been on the receiving end of a tsunami of Nagios alerts.  It’s not pleasant. What happens when an IT outage is followed by hundreds of Nagios alerts? Important alerts fall through the cracks False alarms divert your attention from real issues Seeing the big picture becomes impossible  […]

How to structure your monitoring for better incident management 1. Understand the importance of having a modern monitoring strategy. In this age of cloud, Big Data, and the Internet of Things, where infrastructure is more complex and apps are more and more critical each day, leading companies know that a workable monitoring strategy is a […]

In my last post, I discussed how enterprise application sprawl, if left unchecked, puts organizations at risk. In this post, I’m going to discuss what to do about the problem. Today, any single department within even a mid-market enterprise will have more applications deployed than was standard – organization wide – just a dozen or so years ago. These apps include everything from cloud-based CRM to social media tools to AWS workloads to various big data tools to collaboration suites, and on and on and on.

BigPanda is an incident management platform for IT, NOC, and DevOps teams. Organize, prioritize and triage your incidents faster and more intelligently than ever before. Vastly improve your team’s collaboration around Ops alerts and events. The following guide is the first in our series on getting started with BigPanda’s incident feed. This BigPanda product introduction will help you to get up and running quickly so you can get back to fixing the world’s broken stuff.

I met Vlad in the bar in Vegas after a long day of telco NOC drudgery. He was enjoying his whisky and clearly didn’t want to be interrupted by me asking about his datacenter. I could tell he’d rather I had asked about anything else… Cat Stevens, Greek myths, Faberge eggs. Anything. I interrupted him anyway and asked what’s required to go from the three nines he referenced in his keynote to the five nines his customers demand. He winced in pain. I thought he swallowed an ice cube or his Johnnie Walker was laced with cyanide. Turns out he was deep in thought. He proceeded to share wisdom that inspired me… to drink whisky and grow facial hair.

BigPanda is attending our first ServiceNow Knowledge15 event, April 21-24, at the Mandalay Bay Convention Center in Las Vegas. We’re hoping, though, that the relationships we build in Vegas, don’t stay in Vegas…

Whether we practice more traditional operations processes with a 24×7 NOC and well-documented processes, or we’re embracing DevOps-styles with cross-functional teams and highly iterative methodologies, one problem we all face is the growing disconnect between our monitoring systems, the alerts they fire off, and the processes we’re using to handle operational issues. We log incidents in a ticket, but are the folks working on that ticket aware of the real-time status of the underlying incident? 

We’re excited to announce the release of a major new feature in BigPanda called Sharing! As you know BigPanda intelligently clusters your noisy alerts into high-level incidents. With our new Sharing feature, it’s now easy to notify and collaborate with anyone on your team about critical incidents.

WIX (NASDAQ: WIX) is a cloud based web development platform that makes it simple for anyone to create beautiful, professional looking websites.  With nearly 60 million users worldwide, any service outage of Wix’s platform leads to millions of frustrated and upset customers. Like most modern Operations teams, Wix has a complex monitoring stack including Nagios, […]

New Platform Enables IT Teams to Keep Up with the Thousands of Daily Alerts Arising from the Scale and Fragmentation of Modern Data Centers Mountain View, CA – October 28, 2014 – BigPanda today formally launched the world’s first data science platform to automate IT Incident Management. BigPanda’s platform analyzes the flood of alerts that […]

Data center growth over the last 15 years has created significant growing pains in terms of data center management.  Tasks that once could be done manually by IT teams have hit the limits of scalability, cost, and efficiency.  The key to enabling IT to meet these challenges involves one key theme: automation.

BigPanda is an incident management platform for modern IT, Ops, and DevOps teams. With BigPanda, you will prioritize and route your incidents better and faster, while vastly improving your team’s collaboration and processes. This is part 2 in a series on Getting Started with BigPanda. This guide will help you get up and running quickly and maximize the value you get out of the platform.

BigPanda is an incident management platform for modern IT, NOC and DevOps teams. With BigPanda, you will prioritize and route your incidents better and faster, while vastly improving your team’s collaboration and processes. This is part 3 in a series on Getting Started with BigPanda. This product introduction will help you to get up and running quickly so you can get back to hunting fail-whales and 404 errors.

BigPanda is an incident management platform for modern Ops environments. With BigPanda, you will prioritize and assign your incidents better and faster, while vastly improving your team’s collaboration and processes. This is part 4 in a series on Getting Started with BigPanda. This guide will help you get up and running quickly and maximize the value you get out of the platform.

The last ten years have brought enormous changes to production environments, driven by a best-of-breed approach to production infrastructure enabled by open source and cloud.  This has been a boon for developers in terms of flexibility and productivity,  but it’s also placed a new set of challenges and expectations on Ops.

CONNECT ALL THE THINGS! Here at BigPanda we are constantly working on adding new monitoring systems to our arsenal of out-of-the-box integrations. We already provide integration with all of the most popular monitoring systems & services. Nagios, Zabbix, Zenoss, New Relic, AppDyamics, CloudWatch, Pingdom are all there. And there’s many more – this list gets longer with every week that passes. These out-of-the-box integrations from BigPanda have many advantages:

Monitoring applications in production has never been easier. With only a few code lines, you’ll have New Relic installed and monitoring your application from nearly every angle. When something goes wrong, New Relic will start sending alerts. But then what? (hint – New Relic and BigPanda together is the answer).

In many ways, incident management for devops is similar to typical issue tracking processes: it facilitates coordination and collaboration of daily tasks. For this reason, tools such as Jira, Zendesk, and even email are often used as solutions for incident management. But incident management faces one unique challenge that makes it different from other issue tracking processes. In addition to human-operated workflows, incident management also relies heavily on machine-driven workflows. Unfortunately, traditional issue trackers and ticketing systems cannot accommodate for this with their current product mechanics.

Many alerts place an unnecessary burden on Ops teams instead of helping them to solve issues. The main problem is that most alerts are not actionable enough:

  • They point to issues that don’t require a response
  • They lack critical information, forcing you to spend time searching for more insights in order to gauge their urgency

Few things damage productivity as much as waiting. Waiting forces us to context switch, disrupts our creative momentum and eliminates our ability to experiment. Whether we are deploying a new service or troubleshooting a problem, waiting puts a heavy tax on efficient work. 

Here at BigPanda, we talk to many Ops teams. It’s an important part of our product development process, and helps us make sure that we’re focusing on the right pains for our customers. “Alert Spam” is a major recurring pain brought up by Ops teams: the constant flood of noisy alerts from your monitoring stack. […]