The Network team for a large government agency needed to monitor network performance and ensure that all users had access to their applications. With over 40 mission critical applications to deliver, the networking team was constantly fighting fires with application performance, and was often blamed when there was an outage.
The Networking team began to notice some HTTP errors occurring. They began their investigation by looking to see whether there were any DNS errors, as this was a common issue in their hyper-dynamic environment. In this case, an application had recently experienced a service level event that caused failing over the virtual machine to a new server, resulting in a change of the IP address. Any clients who accessed this application without the proper DNS resolution would see an error.
To solve this problem the Networking team needed to gain visibility into the client or check the DNS servers and investigate which clients are impacted by this error.
- Persistent and non-invasive real-time monitoring of network, and application health and performance
- Visibility into the root cause of application performance issues or service disruptions
- Ability to cross domains from networking to client monitoring, in order to isolate the impacted users and understand how to resolve the issue
Some users were unable to access an application that had recently failed over to a new host server. To isolate and solve for this problem, the networking team needed a way to proactively detect when this problem was occurring, isolate which users were impacted, and understand what in the configuration was causing the issue.
With the ExtraHop platform, the IT team was able to monitor the real-time health and performance of the network. They saw that the network was operating effectively, but noticed that some users were getting HTTP errors. It was clear that this was an isolated issue, not a network wide event, but what wasn't yet clear was why these clients were not able to access the application they were attempting to use.
The team was able to see these 404 errors and which clients were experiencing them with the ExtraHop Discover Appliance and their real-time dashboards. They then pivoted to the ExtraHop Explore Appliance, which offers rapid multi-dimensional search. They first looked at the HTTP traffic to the client to determine that the client had networking access, and found that it was not a L3 or L4 issue. Investigating more deeply with the 404 error code, they were able to see that it was an isolated issue with an application server. Diving deeper into the impacted clients, they were able to see that it was a client side issue where there was a DNS resolution problem.
With ExtraHop, the networking team had access to vital information that wasn't available before with their NetFlow tools. They used this information to understand the full scope of the issue and which users were impacted, and ultimately find and resolve the source of the disruption all by using one solution that gave cross-domain visibility.
The team was able to respond quickly to a service disruption, and eliminate user downtime to a business critical application without having to triage across multiple teams. They were able to use one monitoring solution to detect the issue and diagnose the problem, allowing them to reduce the management burden of addressing the issue.