Achieved a 20 percent application performance improvement through a 1.4TB memcache deployment
Expanded caching usage from 13,000 hits per day to more than 500 million hits per day
Identified optimal SQL workloads for caching and improved SQL query responses by as much as 10x
For Concur, optimizing the efficiency of its application infrastructure means building a competitive advantage. The largest SaaS provider in the ERP market, Concur is able to deliver a better product at lower prices because of its relentless pursuit of performance and scale.*
"Concur processes more than $50 billion in travel and expense reports each year—roughly 10 percent of the worldwide total," says Drew Garner, Director of Architecture Services at Concur. "As a software-as-a-service product, our pricing is directly tied to how much it costs to process each expense report.We have to be able to serve a transaction tomorrow with fewer resources than today. If we don't do that, we'll get beaten by the competition because they'll figure out how to do it first."
Seeking greater scalability and speed, Concur sought to replace its homegrown caching system with memcache. To identify the best candidates for migration to memcache, the R&D Operations team at Concur needed to analyze SQL query performance across thousands of databases. The team also needed to be able to monitor memcache performance and correlate that performance to activity at other tiers of the application infrastructure.
We have to be able to serve a transaction tomorrow with fewer resources than today. If we don't do that, we'll get beaten by the competition because they'll figure out how to do it first.
Drew Garner Director of Architecture Services, Concur
To optimize the database and caching tiers of its application, Concur worked with ExtraHop to implement memcache analysis. The ExtraHop system provides real-time transaction analysis at wire speed—up to a sustained 10Gbps—that covers the network, web, database, and storage tiers of the application.
"Concur stores 52 million items in 1.4 terabytes of memcache with sub-millisecond access and response times, but there is no way to query the system to find a particular key without dramatically impacting performance," explains Garner. "ExtraHop provides this visibility by passively analyzing transactions as they pass over the network."
In one case, the R&D Operations team used the ExtraHop system to find specific memcache keys that were not stored because they exceeded the default 1MB limit."ExtraHop was the only product that could detect the problem and find the offending keys. With this specific information, we could apply compression in the application to fix the problem," says Garner. "Usually, people monitor memcache with server-side and client-side metrics, but there is a lot of activity in the middle that is crucial. With ExtraHop, we can monitor our memcache implementation from end to end."
Besides memcache, Concur uses the ExtraHop system to monitor and optimize the performance and scalability of its database tier. "We have thousands of database servers running two billion SQL queries and pushing out 10 terabytes of SQL data each day. This is the 'brain' of our application, and our DBAs use ExtraHop to make sure it is operating as efficiently as possible," says Garner.
Specifically, the R&D Operations team uses ExtraHop to pinpoint critical SQL queries and stored procedures that are performing poorly. "We customized analysis in the ExtraHop system to determine the total weight of SQL queries by calculating the number of times a query is run multiplied by the time required to return the data," explains Garner. "ExtraHop is the only solution that can provide this analysis continuously and in real time for our entire database infrastructure."
The alternative approach to measuring SQL performance would be to routinely run SQL traces on each database with a profiler—hardly feasible given the scale of Concur's infrastructure. "Our DBAs could either run a trace on each database, which would be like shining a spotlight on a small section of a highway, or they could use the ExtraHop system, which is like lighting the entire highway," says Garner.
Flexible Real-Time Analysis
One reason Concur has seen such great success with ExtraHop is the use of the Application Inspection Triggers technology, a framework for real-time analysis based on scriptable event processing at the application-protocol level. "We use Application Inspection Triggers to target specific real-time metrics that we want to investigate," says Garner. "For instance, we had an abnormally high rate of HTTP aborts for a pool of 60 front-end webservers that host three different sites. We have so much traffic to this pool that it was extremely difficult to isolate the problem server using our user-experience monitoring tool. By customizing the analysis in the ExtraHop system, we could identify a server that was configured to debugging mode. We turned debugging off and immediately saw the HTTP aborts fall by 95 percent in ExtraHop."