back caretBlog

What Caused My Web Server to Crash?

Note: The ExtraHop Discovery Edition is no longer under active development or maintenance, but current license-holders may continue to use all features described in this post.
Welcome back to our Web Application Troubleshooting series, where we walk through how to diagnose common web application performance issues including network switch reboots and devices soaking up bandwidth. In this post, we're going to solve another web application performance problem, an overloaded server, using wire data analytics.

You might use some sort of basic server monitoring already, but there can still be delays between the time a server crashes and when the IT team is notified. More commonly, a web or database server doesn't crash but just responds to requests slowly because of heavy load. By observing the communications between servers, you can quickly detect crashed or overloaded servers and fix the problem.

Suppose a database server is shared among several other applications in addition to your web application, and a development team member pushes out an ill-advised bit of code that soaks up the database server CPU resources. Now your web application is responding slowly, but how can you find out the root cause? Let's look at how you'd go about doing this type of investigation using the Discovery Edition.

Eavesdropping on Your Web Server's Conversations

When trying to identify a crashed or overloaded server using the ExtraHop Discovery Edition, the best place to start is the Summary page, which displays a high-level overview of activity at various tiers of your web application stack, including network, web, database, storage, authentication, and domain name services. problem2_summary_screen

From the Summary page, you want to look for abnormalities in activity. Make sure to use the Time Interval options to make variances more apparent. In this scenario, we have noticed a suspicious dip in traffic. Narrowing the time interval, we can see that the volume of HTTP traffic suddenly disappears for a period of several minutes. To investigate, we click HTTP on the Bytes by L7 Protocol chart.

problem2_l7_protocols_screen

Here, we see a list of all devices communicating using the HTTP protocol. The first two devices in the list stand out because their volume of HTTP traffic is significantly higher than the others, and also because their inbound and outbound traffic counts are inversely correlated. One client is requesting 70GB of HTTP data from a web server. There is definitely something suspicious going on here.

Clicking the first device—the one requesting 70 GB of HTTP data—pulls up metrics specific to that device.

problem2_device_1_screen

Click the Protocol Breakdown tab and hover over the chart to see the exact traffic measurements for each L7 protocol at specified times. We can see that even when HTTP traffic dropped off, this device was still online and communicating, receiving over 7MB of HTTP traffic. Because this device has been functioning during the time period under investigation, we close this window and look at the web server that is sending 70GB of HTTP data during this time period. Clicking the Protocol Breakdown tab for the web server tells a different story.

problem2_device_2_screen

During the same time period, the web server stopped communicating not only over HTTP but all other protocols as well. This indicates that the device lost network connectivity, rebooted, or crashed. In any case, by viewing L2-L7 communications for all systems, we were able to quickly isolate the problem server with just a few clicks.

Note: The ExtraHop Discovery Edition shows summary metrics for specific devices, but with the full ExtraHop Enterprise Edition, you can drill down to see transaction details for specific devices such as error counts, methods used, users, and files accessed.
In summary, a web server lost connectivity, rebooted, or crashed for several minutes. We could very quickly diagnose the problem with visibility into application-layer (Layer 7) communications on the wire. This view not only identifies the server experiencing problems, but also applications or users that might be causing a problem.

Be Prepared—Listen to Your Wire Data

So there you have it. The free ExtraHop Discovery Edition provides you with the ability to listen in on what your servers are communicating to each other on the wire. Our Web Application Troubleshooting Guide shows how to use the Discovery Edition to troubleshoot common performance issues. What are you waiting for? Download the free virtual appliance and guide today so that you're prepared for the next inevitable web application issue. If you're not ready to download the virtual appliance yet, you can always kick the tires with our free online demo (no sign-up required!).

Stay tuned for our next post on misconfigurations and application errors.

ExtraHop Reveal(x) Live Activity Map

Stop Breaches 87% Faster

Investigate a live attack in the full product demo of ExtraHop Reveal(x), network detection and response, to see how it accelerates workflows.

Start Demo

Sign Up to Stay Informed