The ExtraHop team was conducting on-site training recently and brought back a great validation of the importance of monitoring scalability: A SaaS provider for the healthcare industry is analyzing over 5 billion SQL transactions each day on a single ExtraHop appliance (even with one of our older hardware platforms). I'll explain why this important later in the post, but first let me digress slightly.
It's become somewhat faddish for monitoring vendors to talk about how many nodes or transactions their solution is monitoring for customers. But while everyone's boasting about monitoring at scale, the crucial question you should be asking is "What are you monitoring at scale?"
At ExtraHop, we're providing roughly 2,500 metrics out-of-the-box in addition to the custom metrics that IT teams define themselves using our programmable platform. This is not just flow analysis or header data—it includes transactional metrics that can only be uncovered after recreating the TCP state machines for every sender and receiver, reassembling packets into full streams, and analyzing the bi-directional transaction payload and content from L2-L7. Armed with this information, organizations can derive deep insights into their technology environments and business.
Read our CEO's post: Under the Hood: How ExtraHop Delivers 20Gbps of Real-Time Transaction Analysis
Making Sense of 5 Billion SQL TransactionsThe SaaS provider in question helps more than 2,500 hospitals and 9,000 clinics manage payments. When there are different insurers each paying a percentage of the cost of an operation, the hospital uses this service to keep track of who has paid what, for example. In total, the SaaS provider tracks payments made on behalf of nearly 100 million individuals. Each time one of the SaaS provider's services needed to reference an individual, it performed a lookup on a single database storing the unique internal identifiers for these individuals. This crucial database was responding slowly, but reason why was unclear. Monitoring shared resources like database and storage is important but incredibly difficult because IT teams often lack holistic visibility. DBAs and other IT team members need to know if developers are doing bad things to their production systems, or if a rogue application is soaking up all available capacity. When there are 5 billion SQL transactions each day, the challenge is even greater. That's why ExtraHop's scalability is so valuable for demanding environments. With ExtraHop analyzing a copy of its network traffic, the IT team at the SaaS provider could see real-time metrics for all database transactions, including methods used, stored procedures executed along with processing times, and errors. With this visibility, they saw that their database instance was very busy, but still holding up relatively well. The real culprit was some thrashing occurring on the underlying storage system, which the IT team confirmed with ExtraHop's CIFS analysis. In addition to monitoring their database for performance, this particular IT organization also uses ExtraHop to detect potential data leakage. They use ExtraHop's LDAP analysis to see when Active Directory user accounts make SQL queries against sensitive databases and then sending real-time alerts to their SIEM platform if that happens. Normally, only service accounts originating from their applications should be querying these databases.
Interested in learning more about monitoring databases at scale? Read this case study: Concur Optimizes Database and Memcache Performance for Competitive Advantage