How ExtraHop Analyzes 1 PB+ Per Day Without Breaking a Sweat

ExtraHop is setting the bar again for enterprise analytics scalability

It's an axiom in IT that data volumes and rates are always growing. By 2021, global IP traffic will nearly double, according to the Cisco Visual Networking Index. What this means for enterprises is that if they aren't already concerned about the ability of their analytics solutions to keep up, then they ought to be.

If scaling your analytics platform becomes technically (and economically) infeasible, it means that you have to start making trade-offs. You have to choose which logs you are going to collect, which servers you will instrument, or which network traffic you are going to analyze. And those trade-offs result in lost or impartial visibility—a return to the "Dark Ages" where IT and security decisions are made on best guesses. Or your organization once again becomes reliant on the valiant fire-fighting efforts of overworked and stressed IT and security teams.

Setting the Bar (Again) for Enterprise Analytics Scale

ExtraHop has released a new flagship appliance that analyzes a sustained 100 Gbps of network traffic. That's the equivalent to 1+ PB of data per day in a single appliance, which is unheard of in the enterprise analytics space! This blog post will explain how this is possible and what it means for your organization.

First off, some definitions … When ExtraHop talks about 100 Gbps, we don't just mean writing packets to disk at that speed. That's what some other network analysis vendors mean when they say their platforms support 100 Gbps. No, at ExtraHop we're about real-time analysis before anything is written to disk. Technically, this approach is called stream processing, meaning that the processing is taking place in memory, not by querying the datastore after the fact.

Watch this lightboard video to see why stream processing is important:


What's the Technical Explanation of Stream Processing?

This is what happens when the ExtraHop platform performs stream processing, in order:

  • Decryption at scale - Before you can analyze traffic, you need to decrypt it. An increasing amount of data center traffic is encrypted, and for good reason. Sometimes it's because of regulatory requirements such as PCI-DSS, requirements imposed by business partners, or simply because of an increasing awareness of the capabilities of nation-state attackers. The ExtraHop platform performs bulk decryption of SSL/TLS traffic at line rate, even up to 100 Gbps. For traffic protected by forward-secrecy ciphers, such as those mandated by TLS 1.3, we also offer a novel solution that preserves the integrity of the encrypted sessions while also enabling our platform to decrypt the traffic for analysis. Many network security analytics vendors have their heads buried in the sand on this issue, and resort to hand-waving, saying that flow-level analysis is sufficient. We disagree. Enterprises need application-level visibility into encrypted traffic.

  • Real-time analysis at scale - ExtraHop's secret sauce is our ability to reconstruct every single conversation taking place on the network, and then extracting valuable information about those conversations in real time. Real-world network traffic is messy, but TCP helps the client and server on each end of the conversation make sense of the packet soup. ExtraHop recreates the TCP state machines for each side of the conversation so that the system can reconstruct the transactions, flows, and sessions, and then extracts thousands of L2-L7 metrics about those conversations. All this is done in memory, before anything is written to disk. This analysis not only gives ExtraHop the juicy details about the conversation (SQL statements, error messages, file names, users, etc.) but also allows the platform to correlate everything across tiers, so that you can see how a slow Citrix login is related to contention at the storage tier, for example. Metrics are stored for 30 days at minimum, and you can use your network storage for longer lookback.

  • Machine learning at scale - Of course, this metadata is also tremendous data for machine learning—cleanly formatted, meaningful, and voluminous—resulting in anomaly detection with extremely high signal-to-noise ratio. Google Research Director Peter Norvig is quoted as having said, "We don't have better algorithms. We just have more data." This quote explains in part what differentiates ExtraHop's machine learning from competitors. In the machine learning world, features are the name of the game. ExtraHop's stream processor is the best feature extractor for machine learning in the world! Our customers also benefit from the network effect, meaning that the more places ExtraHop is deployed and used, the more accurate our anomaly detection becomes.

  • Forensics at scale - ExtraHop also equips your IT and Security teams with packet-based forensics at scale. The new 100 Gbps appliance is complemented by an array of packet capture appliances that can scale depending on your needs. They use larger drives to provide excellent cost per terabyte, making it feasible to add packet capture to your deployment for compliance purposes or for deep-dive security forensics. Because of our stream processing approach, our platform is able to provide a drill-down from high-level metrics and transaction records to the precise packets that constitute a transaction or flow. Essentially, we've turned packet analysis on its head. Our approach is opposite of that taken by other vendors who start by querying packets written to disk. This surgical packet query is wicked fast, by the way. No more having to walk to refill your coffee while you wait for your query to execute, as with other solutions.

Scalability in Your Analytics Platform Matters

In the end, these capabilities don't matter unless they have a tangible benefit to your organization. What it comes down to is that you don't have to make trade-offs in terms of visibility as your data volume grows.

The new ExtraHop flagship appliance makes it feasible to cover your entire environment, not only from a technical standpoint but also from a financial one. No more having to worry about how much data you are ingesting or which applications you are going to instrument. ExtraHop can give you the wire data visibility you need in a cost-effective manner.

Skeptical? We would love to show you the visibility ExtraHop can provide. Within minutes of providing the data feed, customers have compared ExtraHop to "turning on the lights" in their data center.

For more information about ExtraHop benefits from new generations of multicore CPUs, read our co-founder Jesse Rothstein's blog post Under the Hood: How ExtraHop Takes Advantage of Multi-core Processing.

Subscribe to our Newsletter

Get the latest from ExtraHop delivered straight to your inbox.