The Human Impact of Hyperscale IT Failure

How do you reap the benefits of hyperscale IT while minimizing the risk of complex system failures that hurt your business, and, worst of all, hurt your customers?

Backed up planes

The Delta Airlines outage has been an unfortunate event for everyone involved. It's also a sobering reminder of just how much we rely on IT to run a business and the effects, positive or negative, it can have on customers.

When disaster strikes in the world of IT the immediate response is get the system back up, then ask what went wrong and go into an exercise in how the outage could've been prevented. I want to to set aside the technology issues for a minute and instead examine the impact IT outages now have on people and our everyday lives.

Modern business is based on a series of digital interactions—online purchases, inventory management, prescription refills, mobile banking, tweets, texts, chats— all have become integral to our everyday lives. Trillions of these interactions, generating vast amounts of data, play out across global networks everyday. We all take this sea of digital interactions between systems and applications for granted...until something goes wrong.

What we saw with Delta made all those below-the-surface digital transactions rise up in our collective consciousness, manifesting themselves as real world and painful problems. Here are some some preliminary numbers to show the human impact of just a few hours of downtime:

  • 1,680 flights cancelled over 2 days
  • 2,400 additional flights delayed
  • Roughly 600,000 travellers impacted (estimated based on 83% industry standard amount of utilization for flights modeled using Boeing 737 carrying capacity)

And the business impact:

  • On a typical day, there are 3,600 Delta-related conversations on Twitter. On Monday, there were 43,000, mostly related to the outage.
  • Any customer impacted by more than 3 hours was issued a $200 voucher, meaning future lost revenues/higher cost of doing business.
  • Delta lost roughly 2% of their market capitalization overnight.
  • Considerable harm to the Delta brand

The reality is that these types of failures are a lot more common than you would think, with United Airlines and Southwest Airlines also suffering from similar situations recently, and this is just analyzing the impact of these digital interactions on a single industry.

The loss, or delay, of these digital interactions means wasted time, lost money, and damage to brand reputation. But, when it comes to the impact on people, it means a stranded maid of honor trying to get to a wedding, an anxious business traveler who misses out on an important meeting, or a grandparent unable to attend their grandchild's birthday. When the outcome is potentially disrupting someone's life, visibility into these digital interactions, their behavior, their contents, and reliability is critical. If you don't have this visibility it's a roll the dice and that's a pretty risky proposition for everyone involved.

So, back to the root cause of the problem: The Wall Street Journal reported that Delta's troubles were caused by an electrical problem at their Atlanta headquarters occurring around 2:30 a.m. ET, resulting in a cascading series of events that exacerbated the initial problem. In an industry known for mergers and acquisitions, it's likely that the resulting technical complexity of intertwined technical systems was a contributing factor. While there are some mitigation steps in place, it's pretty difficult to dodge every curveball that chance throws at you.

The impact that IT systems have on customers' lives should be front and center within any organization. It's not enough to measure the availability or resource utilization of individual systems—it is the impact on customer experience which should be the ultimate metric. Modern businesses need a way to watch every transaction as part of their digital interactions with customers. Imagine what is possible when you see all those interactions and are able to recognize when systemic issues are delaying individual interactions with actual humans. Knowing about problems before your customers do could allow you to deliver a consistently better experience than your competitors.

That is what we are striving to help our customers do here at ExtraHop.

Just remember: What happened to Delta could happen to you too! Outages like this are becoming more common as IT relies on legacy tools to manage modern IT complexity. Delta is just the latest company to make headlines by running into the ever-present challenge: How do you reap the benefits of hyperscale IT while minimizing the risk of complex system failures that hurt your business, and, worst of all, hurt your customers?

Focusing on the digital interactions that impact customers, not just the end-points and applications, is a great place to start.

Subscribe to our Newsletter

Get the latest from ExtraHop delivered straight to your inbox.