Cross-Tier Visibility for DBAs

How do my DBAs, who run thousands of database servers, ensure the fastest response times for every query?

The Problem

A SaaS provider operated in a highly competitive environment in which revenue was tied directly to application speed. Their applications used thousands of database servers, pushing out 10 terabytes of SQL queries each day. And every day, their database administrators faced the challenge of avoiding slowdowns and of quickly pinpointing issues among the myriad web, application, web service, and database tiers.

To an e-commerce site that makes $100,000 a day, a one-second page delay can potentially cost $2.5 million in lost sales a year, according to KISSMetrics (based on surveys by Gomez and Akamai). In other words, a one-second page delay can result in a 7% reduction in conversions.

The SaaS provider had limited options. Operating a database profiler or SQL trace continuously in production could require significant overhead (up to 147% with SQL Trace, and 19% with a database profiler, according to some MSDN studies). Correlating events at different tiers was difficult and time-consuming.

Desired Outcome

  • Continuous database profiling with no overhead.
  • Total visibility into activity for the web servers, application servers, databases, and storage, with the ability to drill down into details.
  • Obtain early warning of database issues based on trends and historical performance data.

To an e-commerce site that makes $100,000 a day, a one-second page delay can potentially cost $2.5 million in lost sales a year...

The Solution

First, the database administrators used the ExtraHop platform to pinpoint slowdowns to a specific tier: web, application, database, or web service. They set up a cross-tier dashboard that mapped out all of the infrastructure components that comprised an application. In one view, the team now had access to real-time information about the traffic in each tier. Information, such as web server return codes, errors, URIs and response rates, application server methods, errors, and network conditions, were correlated along with database query times, database methods, database response rates, and processing time. Suddenly, the database administrators had visibility into whether an issue was even caused by the database.

In another set of charts, the database administrators used the Total Query Time view for databases. This built-in metric let them easily customize analysis to determine the total weight of the SQL queries. While the view is easy to implement, the behind-the-scenes calculation is quite complex. ExtraHop retrieves queries and methods off the wire, calculates the number of times a query was run, multiplies this value by the time required to return the data, and returns a number to the dashboard. All of this information is pulled off the wire, with no impact to the database itself.

As alerts went off from web services, administrators could now quickly rule out tiers that were performing normally and drill down for errors on the others. Whenever the SQL dashboard showed an increase in response time, the ability to drill down into the database environment and see the actual queries, methods, and errors was the final piece of the solution.

User Impact

The organization reported a large reduction in the mean time to recovery during outages for their fiscal year. The ExtraHop platform offered the database administrators visibility into the database infrastructure and a completely new way to work. They now had detailed information on method and query timing and on payload—without installing database profilers—and could also see relationships among the database, application servers, web servers, and networks. The cross-tier dashboard allowed them to correlate a number of services that previously could be diagnosed only individually or through the insights of individual administrators.
Contact us Try our free online demo