An online travel company experienced significant business growth due to the rapid adoption of its APIs by partner affiliates. The easier and faster it was to adopt their travel product APIs, the more affiliates joined, referred, or sold services through them, which was deemed a significant competitive advantage and revenue driver. As this part of the business rapidly grew in volume and profitability, it also represented an increasing risk to the business, as any API disruption could result in a heavy loss. Outage loss estimates ranged between $3,000 and $60,000 per minute depending upon affiliate, product type, and time of year.
They depended on the APIs of their vendors for real-time product availability and price updates, but lacked end-to-end visibility of performance and availability of these web services by affiliate partner, vendor, and product. They also lacked a fast way to discover and classify all APIs when they were added, updated, or retired.
Attempting to instrument logging for these transactions was costly and slowed application development. And logs could not correlate upstream and downstream network, infrastructure, and application behavior with the APIs. An agent-based monitoring approach proved to be costly, complex, and lacked upstream and downstream visibility and speed.
When the operations team discovered a problem, they didn't always know whether it was their network, their appliance, or a related service.
- The application team had developed thousands of APIs, many tailored for each data consumer depending upon the business relationship.
- Retirement of older APIs caused outages for some affiliate partners. The APIs had diverse connectivity methods: Internet links, direct VPN connections over IPSEC or SSL, and other direct links like MPLS.
- Nearly each API had its own unique SLA and could not be managed in isolation since they were all part of a series of web applications that provided the travel data.
- Ensure API and web services infrastructure meet SLAs.
- Autodiscover, classify, and categorize APIs by service, partner, vendor, and product.
- Provide real-time analysis of every API's response and process time and its success, failure, error type and rate, and transaction volume by partner and vendor.
- Automap and correlate API performance with network, infrastructure, and application performance dependencies.
- Provide proactive early warning through trend-based analysis and alerts.
- Create zero server, infrastructure, or application changes and no overhead on servers.
- Integrate and improve workflows for rapid problem identification and resolution.
- Extract revenue values from each transaction to correlate historical and real-time value of the service with its volume and performance.
If an API went down, their monitoring tools often showed that everything was okay, although it wasn't. Hours and sometimes days passed before the operations team found out that there was a problem.
First, the online travel company needed real-time classification of API traffic. Unique strings in each API allowed the ExtraHop platform to identify and classify the API traffic. Using this identification and classification, the administrators implemented two real-time dashboards:
- A dashboard to track the top 100 APIs, their completion time, network statistics, and operations information, such as amount of traffic, requests, responses, and errors.
- A dashboard using the ExtraHop Open Data Stream (ODS) to send all the information to their Elastic system to track the entire history of their API users.
Next, they collected network metrics about the incoming paths, so administrators could quickly identify or disqualify the network as a source of any given outage. Because customers connecting via SSL VPN or IPSec had different issues from those connecting directly over the Internet, administrators mapped out the appropriate metrics (such as MTU, window sizes, and SSL statistics) for each class of incoming connection. When it came to break/fix resolution, all information was at hand, including dependencies. Finally, using ExtraHop activity maps, administrators could see exactly which downstream dependencies the APIs had, in real time, without having to track the individual connections. They then organized their applications around each application component as well as the APIs supporting those applications.
Through identification, classification, and dependency tracking, administrators finally had a handle on the API environment. With all of this information from the ExtraHop platform, they could uniquely track SLAs of each environment and triage the highly valuable APIs. In the first month alone it was estimated that they saved over $200,000 in proactive early warning of API performance impacts. The amount of time saved in automating web services governance was equal to one full-time headcount, further reducing costs. Most importantly they now had the ability to understand the performance, degree of usage, and whether or not they or their partners were meeting their SLAs dramatically reducing business risk.