On June 26, 2014, a large corporate office with thousands of employees began to experience intermittent service outages across its network. Key applications and services began failing for onsite employees while remote employees could not access internal resources. Traffic to the Internet was failing as well. The IT team was inundated with complaints.
- Understand bandwidth consumption by user, service type, and overall impact to other applications.
- Trend utilization over time to set a standard set of policies based on proper Internet usage
- Integrate the data with their SDN controllers so they could set automated policies for bandwidth management in real time to prevent service disruption
- Have information to plan future Internet and WAN capacity needs based on real usage by user and group
- Proactively identify and correct network outages without relying on insufficient logs, intermittent probes, or slow packet capture analysis
Both onsite and remote employees couldn't reach key internal services and resources, and access to the Internet was also limited. The IT team was inundated with complaints.
ExtraHop had already been delivered for an initial Proof of Value (PoV) test. Because they had high-fidelity SPAN capabilities in place using Arista Network's DANZ' function, ExtraHop was installed and analyzing traffic in under 20 minutes. The ExtraHop platform quickly identified the root cause of the slowdown: More than 100 employees were saturating the network with web video services to ESPN.
The IT team was aware of the USA vs. Germany World Cup match, but by drilling down into the HTTP URLs and payloads they could verify that the World Cup match was the actual content causing the saturation. While the corporate network typically consumed less than 60% of its 1Gbps uplink, so many employees streaming HD video meant that the link was completely saturated.
Their legacy network performance monitoring and packet capture tools simply indicated a high saturation of HTTP traffic communicating between hundreds of sources and destinations but because ExtraHop had full access to Layer 7 HTTP payload, the specific identification headers that were directly correlated to the game identification numbers were singled out and reported to the IT team. Instead of receiving generic endpoint information, the team now had specific and actionable information about the specific streams that were causing the bandwidth saturation.
To get internal systems up and functional, the IT team first blacklisted the ESPN video streaming service and notified employees that streaming the game to their personal computers was not allowed due to application and service interruption. As an alternative, the IT team scheduled several conference rooms where they streamed the match. They then scheduled a few popular future matches for those same rooms culminating with the World Cup final between Germany and Argentina on Sunday July 13. Employees actually came in to work on Sunday to watch the final together because they had watched all previous matches together. Not only did IT and management eliminate the source of the issue, they built and reinforced company morale in the process.
Knowing that their network was susceptible to high bandwidth streaming, the IT team then used ExtraHop to audit internal teams' video streaming behavior. They discovered that the training and support department hosted all of their content externally and consumed a significant amount of bandwidth in the process. An analysis of usage patterns indicated that bringing those servers and content in-house would save 10 to 20% on bandwidth costs and could be provided by IT more cheaply than the external host while providing a higher quality of service.
After the World Cup service disruption, IT leveraged ExtraHop's trending of Internet usage and behavior to create an analysis for an improved ISP multihoming strategy by monitoring availability, performance, and cost. It turned out that their Tier 1 ISP performed only marginally better than Tier 2 ISPs but cost twice as much. This put the IT team in the driver seat to negotiate more effectively and have factual information to hold their ISPs accountable.
After the insourced migration of training and support's content servers, the IT team was able to reclaim between 10% to 20% bandwidth capacity depending upon hiring trends, new training updates, and initiatives. They also saved more than $120,000 annually in unnecessary hosting costs.
Human Resources reported results from their annual employee survey that a majority of the team cited the World Cup experience and IT's accommodation as an example of why they loved working for the company. In fact, both HR and various hiring managers began using the World Cup episode as a key company selling point during recruiting efforts. It was seen as a corporate differentiator attractive to recent college grads in a technically competitive labor market.