An insurance company needed a better alternative to monitoring its database systems in their production environment. Considered the most critical part of their infrastructure, production database behavior was also the least understood. It took weeks of costly forensic activity to isolate database issues which, according to their estimates, caused more than 15 percent of all performance problems and application outages.
Traditional database profiling tools and agent-based tools increase performance overhead up to 147 percent, according to several MSDN studies. The performance cost plus expensive per-server licensing fees made it impractical and too risky to profile databases in production and did nothing to correlate the dependencies between the databases, infrastructure, network, clients, and applications for fast remediation, continuous improvement and tuning.
Because "do nothing" was not an option, the Director of Applications and Infrastructure relied heavily on internal probe data—synthetic transactions designed to simulate real world database queries. However, the periodic nature of synthetic transactions meant that intermittent problems were missed and representing real-world requests and scenarios seemed impossible. Applications and client behavior were changing too frequently and real traffic patterns rarely matched simulated test patterns.
Database troubleshooting often meant building a representational state of traffic through queries. The database administrators had neither the access nor skills to capture real traffic off the wire to validate success. Network administrators and DBAs only joined forces when attempting complicated forensic analysis.
- Profile the databases in real time with zero overhead
- Continuously observe all errors, query response time, rate, and volume
- Analyze all database requests and responses across Microsoft SQL, Oracle, DB2 and Informix systems.
- Correlate database transactional behavior with all other infrastructure, application, and storage components.
The IT organization has estimated they're saving a minimum of $350,000 a year in personnel time. Not only have they reduced spending resources on unproductive war room sessions but it's dramatically improved cross-team collaboration.
The first step was to adopt the passive database profiling features of the ExtraHop platform. The database administrators quickly learned that ExtraHop allowed for unlimited analysis without performance penalties to the databases. During the initial ExtraHop deployment a problem involving high database load occurred. Without any custom configuration ExtraHop drilled down into that database's details using the Transactions by Method per Second chart and revealed that a method which should never run was at the top of the list: select * from t_product.
Together, the Transactions by Method, throughput, server vs. network response time, and per-query processing time metrics told an important story on a second-by-second basis. For the first time they possessed the insight and factual data to troubleshoot database-caused latency issues in moments rather than weeks.
Database method monitoring had already proven itself during deployment, so the Transactions by Method chart became a regular part of the database administrators' dashboard. This chart not only highlighted offenders but also showed when the profile of the database queries were changing, indicating an application change or rollout and a signal to pay closer attention. The Director of Applications said, "Our team is now armed with wire data that doesn't just eliminate the costly and inefficient war room process, it actually has improved our whole IT workflow and processes."
The IT organization has estimated they're saving a minimum of $350,000 a year in personnel time. Not only have they reduced spending resources on unproductive war room sessions, but it's dramatically improved cross-team collaboration. Within the first month of owning ExtraHop, the database team began an initiative to tune the top 20 percent of the worst performing and most frequently used queries as well as reduce their error count by 50 percent. Within three months they achieved these goals because for the first time they knew exactly what, where, when, and how frequently these events occurred.
They also set advanced trending and alerts to prevent database-caused outages from occurring. The Director of Applications reports that in the last 6 months there have been zero database-caused outages or performance slowdowns.