In IT, there's an adage that packets never lie. Packet captures are the gold standard for determining what actually happened. This is true, as the observed communication over the network is empirical data. However, as a means to provide visibility to a modern IT organization, continuous packet capture is horribly outdated and inadequate.
Until now, network teams have only had network performance monitoring (NPM) tools that wrote packets to disk, also known as data at rest. When you are trying to respond to performance or security issues proactively, it's better to analyze the information traversing the network while it's in motion, otherwise known as data in flight.
Write-to-Disk vs. Streaming
That's the main difference between legacy NPM tools, including those from Riverbed, Netscout, and Corvil, and the ExtraHop platform—those products rely on write-to-disk architectures that put the bottleneck on performance on the ability to write to and read from disks. In contrast, ExtraHop uses a streaming architecture that analyzes all the packets before writing anything to disk. This approach puts the bottleneck on the bus throughput, RAM, and CPU. See the diagram below.
Write-to-disk architectures have their roots in technology from 1987. They don't scale, and can't handle the volume of data in a modern network.
To understand the limits of write-to-disk architectures, it's helpful to remember that packet capture technology was implemented in 1987, when Van Jacobsen created tcpdump. Today's network speeds are well over 10,000 times as fast, which is why network performance monitoring tools keep adding disk spindles, faster disk speeds, better compression, and deeper buffers. These are band-aids on the problem. Looking at the amount of storage that is in their product portfolios, you could be excused for thinking that these NPM vendors are in the storage business!
Meanwhile, their users are struggling to keep up with the deluge of data. Enterprises we have spoken with complain that their packet capture solutions cannot provide enough lookback to cover a weekend!
Why ExtraHop Chose Streaming Architecture
When Jesse and I designed the architecture of the ExtraHop platform, we wanted to build a product that could mine the wire for real-time insights. We knew that there was a wealth of information that was on the wire, but that the volume of this data made the traditional continuous packet capture approach infeasible.
With our previous experience building a platform that brokered streams of packets at massive scale, we understood that advances in multicore processing and storage would be able to keep up with expected increases in network traffic, especially as high-speed stream parsing is a highly parallelizable task. To learn more, read Jesse's blog post, Under the Hood: How ExtraHop Takes Advantage of Multicore Processing.
Instead of the traditional write-to-disk architecture, we opted for a streaming approach for several reasons:
- Application fluency – Enterprises value application-level details such as errors, methods, users, status codes, reads/writes, record types, host queries, and other details contained at Layer 7. We wanted to provide this application-level visibility for the entire environment in real time and not after the fact, and the only way to do this at scale was with a streaming architecture. Retrofitting a write-to-disk architecture for real-time application fluency is impossible.
- Longer lookback – Relying on stored packets for analysis limits the amount of lookback available for capacity planning, before-and-after comparison, and other tasks. We've heard practitioners complain about the lack of utility of a system that gives them just two days of lookback. In contrast, ExtraHop's stream architecture extracts valuable events and metrics—the proverbial needles in the haystacks—and stores them for a minimum of 30 days on the appliance. Customers can also make use of their own NAS for even longer lookback.
- It's more than just network teams – One of the megatrends in IT is the breaking down of siloes. Legacy NPM tools are too packet-centric and require too much expertise to be of use by people without a networking background. By doing the deep analysis upfront, the ExtraHop platform presents performance, security, and business data in terms that everyone can understand. ExtraHop is an excellent solution for network teams, but it also brings them into the Devops fold where teams are sharing visibility into how well applications are running. Our customers have told us that one of the greatest outcomes from the deployment of ExtraHop is a better-functioning team, aligned to the delivery of a service rather than their technology silo.
- Packet capture is an option, not a requirement – Just because ExtraHop uses a streaming architecture, it doesn't mean that we cannot capture the raw packets when necessary. Rather, we make that an option rather than a requirement. We implemented packet capture in a novel way—we only store packets comprising flows and sessions that are of interest. You can set policies to capture the packets associated with specific events, such as when a user writes to a sensitive storage partition or when a malformed request causes an application error, for example. This type of precision packet capture is only possible with a streaming architecture that is reassembling multiple packets into complete flows in real time. Additionally, we've had customers who have made use of our open APIs to store continuous packet captures in inexpensive third-party solutions.
Bring NPM into the Modern Era
With the ExtraHop platform, you can bring your network monitoring into the modern era. Many of our customers are on network teams. Previous to bringing on ExtraHop, these people were called in as a last-resort in their organizations because they knew how to decipher packet captures. They would spend hours sifting through these captures, looking for the needle in the haystack. Now, these highly skilled engineers equip application teams and others with the visibility that they need through ExtraHop. Instead of being relegated to the role of "the sniffer guy," these people have elevated themselves and their teams to become key players in IT, security, and even business operations.
Ready to make the leap from legacy NPM to IT Operations Analytics for network teams? Check out our online demo and then schedule a meeting.