Those of us lucky enough to have attended the August MonitoringScape meetup at BigPanda with Adrian Cockcroft enjoyed the master of performance tuning as he waxed philosophical about why monitoring isn’t a solved problem after twenty plus years honing our craft. Spoiler-alert: it’s difficult to solve a problem that keeps changing.
Decompressing from an exhausting, inspirational few days at Knowledge16, the annual ServiceNow event...
From humble beginnings (my first Knowledge was a few hundred attendees in a tent in San Diego), Knowledge has become a global tour de force. This year, Mandalay Bay could barely contain more than 11,000 customers and partners (and the expo hall could barely contain more than 100 decibels of the tech equivalent of Queensryche). Getting into the keynote felt like rush hour on the subway in midtown Manhattan.
You’ve solved your noisy alert problem with BigPanda. Now solve your noisy ChatOps problem with BigPanda and HipChat, thanks to HipChat’s new integrations platform, HipChat Connect.
If every incident update were to push a new message, your Ops chatrooms would quickly become more crowded than O’Malley’s Pub on St. Patty’s Day. BigPanda now integrates with HipChat via HipChat Connect, so you can not only view the status of BigPanda incidents in HipChat, but also view incident details with links to relevant actions in the glance view beside the chat room.
More than ever, we live in an age of now. Movies on-demand. Health metrics streamed to wearables. Package delivery by drone. Any cuisine, any time.
Immediate everything enabled by technology requires new application architectures, organizational thinking, and approaches to performance monitoring.
One of the technology trends most associated with this "short attention span lifestyle" is the segmentation of services and workloads into containers.
Sam’s a father of two boys living in the bucolic LA suburb of West Covina. He’s a family first guy who paints model military cargo planes for fun, makes award-winning paella, hates his commute, and loathes his phone between the hours of midnight and 4:00 AM.
Sam was a kid when he joined News Corp as a help desk analyst in 2000. More than 15 years later and he’s now Sr. Director of IT managing a growing team of 30 NOC engineers, sys admins, and DBAs. Over the years, he has received more promotions than Trump on his own Twitter feed by delivering results and never wavering from two core beliefs that influence everything he does:
We’re more dependent than ever on cloud infrastructure. At work. At home. At play. But what happens when the cloud fails? Ask the more than 75 million Netflix subscribers or more than 100,000 companies that rely on Salesforce.com. They’ll tell you cloud failures are costly and painful.
Cloud-based apps and services must be available all the time… and yet they aren’t. DevOps and NOC teams responsible for maintaining their health must resolve issues immediately… and yet they can’t. On this MonitoringScape Live episode hear from the experts why cloud monitoring is critical, why it’s hard, and what organizations are doing to help all of us live cloudier, better lives.
We all agree on what DevOps isn’t: a product, service, mineral, or celestial body. And we mostly agree to disagree on when the coveted “DevOps” badge is earned. All of which is deeply unsatisfying.
I needed an answer to the question “what is DevOps?” so I turned to that oracle in the cloud, that omniscient arbiter of global zeitgeist. Boy was I disappointed.
Above is the first Google image result for the search “DevOps”.
Huh? 11 questions it made me ask...
We’re happy to announce that BigPanda now integrates with Catchpoint! Catchpoint is a popular cloud-based monitoring tool used by ops teams to measure availability and performance for synthetic transactions and real user web sessions. By integrating with BigPanda, Catchpoint customers can now aggregate all of their monitoring alerts in one place, intelligently clustering them to reduce alert noise and spot critical issues faster.
As a TechOps community, we’re awash in buzz words. Most are initially used to establish geek credibility yet quickly become cliches. Take, for example, the term “DevOps”. From its inception as a Twitter hashtag used to promote a meetup in bucolic Ghent, Belgium in 2009, it began to be co-opted months later by ops teams around the world aspiring to manage infrastructure with code.
This is part two of a two-part post about using event correlation to thwart DDoS attacks. Channeling Mark Twain: it would have been shorter if I had more time. In the last post I described why DDoS attacks for SaaS providers are no different than performance and availability issues experienced in other domains like healthcare, finance, or retail. In this post I’ll share a customer story about a security breach that never happened… thanks to a savvy DevOps team and data science.
Why DDoS attacks aren’t just a security problem… and monitoring traffic isn’t the solution – Part One
Every company’s a target, every customer’s at risk. But the now-cliched threat of data breaches from Distributed Denial of Service (DDoS) attacks obscures a bigger threat: outages that impact not just data integrity but also profitability, brand equity, and customer retention.
The volume of attacks is growing and so is the impact of down time. According to Akamai’s most recent State of the Internet report, DDoS attacks are a bigger threat than ever before. “The number of DDoS attacks continued to increase substantially in Q2 2015, more than doubling the number observed in Q2 2014.”
We’re adjusting to the new reality that DevOps is a compelling layover on the journey between legacy ops and self-healing infrastructure. Eliminating the cultural gap between developers and operations, the now-cliched state of IT nirvana called “DevOps”, is by no means the end goal. The goal is reliable system performance and availability without human intervention - the panacea called “NoOps”.
We’re proud to be unveiling a new concept we pioneered in the den that finally moves beyond dashboards as eye candy to a new place where IT analytics can be used to make better ops decisions. It’s called Service Health Analytics and it exposes all data from all monitoring sources in the form of configurable dashboards that can be customized, saved, and shared.
Tsunami detection. Crop dusting. Biohazard monitoring. What may sound like innuendos in the next EL James novel are also fields being revolutionized by quant jocks and smart algorithms. And yet, despite all the innovation, we technorati continue to bastardize the terms “data science”, “machine learning,” and “big data”. They’ve become lazy speak for “we’re not sure what we’re doing so we’ll hand wave cliches until we have real technology and a business model."
Rishi is too humble to be the CIO of a Fortune 100 bank, too busy to be the father of four, too accomplished to blog about ice cream, and too educated to love John Gray. Mostly, he's too unpredictable to fit stereotypes and too passionate about everything he does to do anything at less than full throttle.
I met Rishi this week at the Pacific Crest Global Technology Leadership Forum in Vail where he was presenting and I was lucky to be in the audience. We spent an hour together before his talk that inspired me to rescue Nepalese orphans... and eat more ice cream.
Rishi's been an IT leader since before we called it that. He has helped organizations grow and shrink and grow again. He's more scared about the state of IT today than he has ever been.
Here are excerpts from the discussion...
What is MTTR? Don’t answer with what it stands for or how you use it. The question is more philosophical than literal. For too long we’ve measured operational performance based on the number of minutes it takes to resolve an incident. The almighty trend line slopes down then we gulp milk from the jug of IT inflated ego like NASCAR drivers drunk on Nagios exhaust fumes.
Like the Zen riddle about one hand clapping it’s important to first ask:
- What’s an incident?
- What does it mean to resolve one? …and (the ever-blasphemous)
- Is it unequivocally better to resolve them quickly?
ITSM is evolving thanks to new capabilities that make it easy to visualize service health based on real-time CMDB updates fed via automated change management driven by smarter monitoring infrastructure. We’re nearing a time where machines will manage machines. At BigPanda, we’re doing our part to get there quickly.
Today we announced an integration with the excellent cloud monitoring system Librato which was recently acquired by SolarWinds. We’ve enjoyed working with the Librato team to bring the product to market and now are eagerly awaiting feedback from the loud and proud community of BigPanda+Librato users.
In 1792, the New York Stock Exchange opened its doors on Wall Street with five stocks available for trade. Today, more than 2,800 companies list on the NYSE with a combined market value of more than $15 trillion. In 223 years, everything except the name has changed.
I met Vlad in the bar in Vegas after a long day of telco NOC drudgery. He was enjoying his whisky and clearly didn’t want to be interrupted by me asking about his datacenter. I could tell he’d rather I had asked about anything else… Cat Stevens, Greek myths, Faberge eggs. Anything. I interrupted him anyway and asked what’s required to go from the three nines he referenced in his keynote to the five nines his customers demand. He winced in pain. I thought he swallowed an ice cube or his Johnnie Walker was laced with cyanide. Turns out he was deep in thought. He proceeded to share wisdom that inspired me… to drink whisky and grow facial hair.