Under the Hood: How ExtraHop Takes Advantage of Multi-core Processing

[2/23/2015 - Editor's update: ExtraHop has now released a 40 Gbps appliance, the EH9100. Read more in our press release: ExtraHop Enables Real-Time IT Analytics at Groundbreaking Speed]

When we talk to IT teams who are considering ExtraHop, there's often a discussion about scalability. People are skeptical, and rightfully so. Many monitoring vendors sell the dream of real-time, wire data analysis. In reality, they only do so for a subset of traffic and for a relatively small number of concurrent flows, or they write the bulk of the data to huge disk arrays for post-hoc analysis.

We love to talk to people about scalability and performance because it matters. For real-time analysis, if you can't keep up, you fall behind, and if you fall behind, you might never catch up again. Additionally, greater scalability of real-time monitoring offers IT teams visibility into very large environments in which they previously were flying blind, and it offers a more cost-effective approach with fewer appliances.

20gbps throughput The EH8000: An All-in-One Operational Intelligence Platform

Our new EH8000 appliance performs real-time, L2-L7 transaction analysis for up to a sustained 20 Gbps. Throughput is only part of the picture. A single EH8000 can analyze more than 400,000 transactions per second, extracting application-level health and performance metrics such as URIs associated with HTTP 500 errors, slow stored procedures in a database, or the location of corrupt files in network-attached storage. This level of performance is far beyond what other passive monitoring vendors even advertise let alone what they actually do. For example, our EH8000 performs over an order of magnitude faster than the recently announced TruView appliance from Visual Network Systems, which, according to their own materials, only analyzes one million transactions per minute, or less than 17,000 per second. The ExtraHop platform's analysis of more than 400,000 transactions per second is a true market leader.

Even with our current lead, I believe that ExtraHop will continue to widen the scalability gap compared to other products on the market. This is a bold claim, so please allow me to explain why.

Reason #1 – ExtraHop was built from the ground up for multi-core processing.

The first reason for ExtraHop's substantial performance lead—and the reason why I believe ExtraHop will continue to widen the gap—is that our platform was built from the ground up for multi-core processing. Network processing is embarrassingly parallel and can be easily split across multiple cores. Systems that are more parallelized see greater speedup with more cores, according to Amdahl's Law. The chart below illustrates the effect of Amdahl's Law, where a program that is 95% parallelized sees a maximum speedup that is five times the maximum speedup of a program that is only 75% parallelized.* While other analysis products will see some benefit from multi-core processing, the ExtraHop platform, which is unburdened by legacy architectures and built from the ground up for multi-core processing, will continue to see tremendous benefit.

Vendors who are working to convert their existing code to run faster on newer multi-core processors face an uphill battle. As a recent Dr. Dobbs report, the State of Parallel Programming 2012, states, "Refactoring existing code is particularly challenging, so the researchers recommend that parallelism be part of the design from the start." The report goes on to detail the types of concurrency bugs that developers often struggle with when converting existing serial code to parallel code.

Even at ExtraHop, where our software is designed for multi-core processing, we still deal with issues such as lock contention, concurrent access, NUMA (non-uniform memory access) effects, and cache ping-ponging. These are sophisticated problems that can have disastrous consequences if handled poorly, especially in this type of high-performance appliance, and there are relatively few development tools that can help.

Reason #2 – ExtraHop's Engineering team is committed to performance.

Writing high-performance code is a rarely practiced art. The majority of software developers work on front-end applications that have relatively forgiving timing constraints. ExtraHop does not have this luxury with real-time packet processing, so we are laser-focused on writing performance-sensitive code. We are constantly profiling our systems to seek out bottlenecks, especially in the packet path. If new code adds a few as 1,000 CPU cycles, we will notice. We also pay close attention to caching effects, both for dedicated per-core and shared on-die caches. This is not to say that other vendors' engineering teams are not committed to performance, but simply that our focus on performance is one of the reasons why the ExtraHop platform performs real-time transaction analysis at a sustained 20 Gbps.

As an aside, if you are a software engineer looking to solve kernel-level, systems-engineering problems and enjoy working with an outstanding team of developers, we're hiring.

Reason #3 – ExtraHop uses OS bypass for the data plane.

ExtraHop uses a custom Linux distribution for activities on the control plane, such as running the administration UI and configuring the system. For the data plane, ExtraHop uses a proprietary networking microkernel that runs on the metal for the fastest possible performance. Optimizing packet scheduling, performing memory management, and talking directly to I/O devices all help to speed up our packet processing considerably.

In addition to packet processing, another challenge is recording the stream of health and performance metrics to persistent storage. When we were designing the ExtraHop platform, we considered many commercial and open-source databases. We ended up rejecting these options because they would have required continuous management and administrative tuning. Most importantly, these RDBMSes couldn't handle the level of sustained reads and writes that the ExtraHop platform requires. We also tried pure file-based systems that didn't scale and investigated less-structured datastores such as Berkeley DB and Tokyo Cabinet. We could have solved this problem by throwing money at it, such as by requiring our users to purchase an expensive SQL cluster, but we wanted to build an all-in-one appliance with a small footprint that required little care and feeding.

To keep our deployment simple and make real-time analysis available to users immediately, we built a proprietary, high-speed, real-time streaming datastore that is optimized for telemetry, or time-sequenced data. This datastore bypasses the operating system to directly read from and write to block devices and uses fast in-memory indexing so that data can be read as soon as it is written, similar to how Google uses Big Table for web indexing.

ExtraHop Platform Architecture ExtraHop Platform Architecture

You Are Right to Care About Scalability and Performance

ExtraHop cares as much about performance as you do. It will affect how much value you get from the product, and it also impacts data fidelity. If a load balancer, switch, firewall, or other in-line device is overloaded and drops packets, the sender will simply retransmit them (assuming a reliable transport protocol such as TCP). That doesn't happen for an out-of-line device that uses a SPAN or network tap. If the device is overloaded, packets will drop, and analysis will suffer.

When choosing a real-time transaction-analysis solution, be sure to question the vendor on scalability. Ask them when their solution was first developed and if it has been redesigned for multi-core chip architectures. If they claim a certain level of throughput, ask them if they can handle high packet rates as well—many monitoring products that do not scale in real-world environments only talk about one end of the performance curve. And, finally, be sure to contact us so we can show you the ExtraHop difference!

* It's worthwhile to consider the necessity of parallelization. Since 2005, increases in clock speed have plateaued while transistor counts have continued to grow according to Moore's Law (see the graph below). During the same period, CPUs have gone from one to two to four to six to eight to sixteen CPU cores, starting with the dual-core Itanium 2 in 2006. To see maximum benefits from new processors, software developers must understand how to parallelize their systems. As experts have noted, this limitation means that the free lunch is over for software developers in regard to benefiting from hardware improvements. As a recent Intel whitepaper put it, "The future of computing is parallel computing, and the future of programming is parallel programming."

Subscribe to our Newsletter

Get the latest from ExtraHop delivered straight to your inbox.