What happens if your CFO can't send emails because Microsoft Exchange just stopped responding? Given how important email communication is to any modern enterprise, I'm sure this was beyond frustrating for all the end users as well as the network team. How would you like to be the one sweating bullets while diagnosing this problem?
For today's installment of War Stories, we'll walk through an Exchange troubleshooting story we experienced recently and show you how ExtraHop saved the day there. The story started with lots of users calling in with Exchange latency issues, yes, including the CFO. So the network team got on the ExtraHop system right away to look for the likely culprit.
Exchange, being a mission-critical system was behind a firewall in the DMZ. Luckily the ExtraHop box was monitoring both sides of the firewall and could see exactly what was happening to the Exchange server as well as other related applications. The 1st thing that jumped out when they looked at this firewall was huge hourly spikes in Retransmission Timeouts (RTOs). Tracking the source of the RTOs led them to a data warehousing application on the other side of the DMZ. It was "sucking down" a surprisingly large volume of data across the firewall, we're talking about 10s of GBs every hour.
Some conversation with the application and data warehouse teams quickly helped us determine that the extensive logging and moving of data into the data warehouse really overwhelmed this particular firewall, as the Exchange server and other applications were sharing the same interface that our data warehouse used to transport large amounts of data on an hourly basis. This of course, led to many downstream problems, the most visible one being the Exchange latency. The network team was happy that ExtraHop pinpointed the problem in such a short amount of time, allowing them to find a work-around quickly that satisfied the dual business needs of archiving critical business data as well as enabling electronic communcation to flow.