TCP Analysis: Where the Network Meets the Application, Part 2

April 22, 2016

TCP Analysis: Where the Network Meets the Application, Part 2

A diagram of the OSI Model.

In Part 1 of this two-part series, we imagined the perfect framework for no-impact monitoring of an application stack and found it was right in front of us all along in the Transmission Control Protocol (TCP). Today, we delve into the particulars of TCP measurement, particularly five TCP metrics featured in the analysis of the ExtraHop system: Retransmissions, Retransmission Timeouts (RTOs), Round-Trip Time (RTT), Aborts, Throttling, and Zero Windows.

First, a quick refresher on TCP: TCP occupies the transport layer of the OSI model (also known as Layer 4), sandwiched neatly between the network layer and the higher-order application protocols. It guarantees delivery of packets across the network, automatically adjusts to changing network conditions, and signals resource contention in host servers.

Although TCP seems like a straightforward protocol, beneath the surface lies a wealth of technical trickery. To immerse yourself in TCP we recommend Wikipedia of course, along with the following texts:

A Wealth of Network and Application Performance Data

First of all, TCP tells us about the health of the underlying network. We observe retransmitted packets (or segments, in TCP-speak) and infer that they were retransmitted because something failed to deliver them the first time. Retransmissions are considered normal in that TCP recovers from a dropped segment and resumes the connection seamlessly. However, retransmissions are only normal until they're not. Each retransmission results in latency, and as retransmissions stack up, that latency gets worse.

The worst-case scenario is a Retransmission Timeout (or RTO). An RTO occurs when a segment is retransmitted, but the receiver does not acknowledge the retransmission. The sender waits for some predetermined amount of time before retransmitting again. Depending on the timeout value configured in the operating system, this delay can be anywhere from 1 to 8 seconds, which is an eternity in webpage time.

Measuring RTOs enables us to pinpoint congested switches and routers in an internal network, or even across the Internet. Similarly, Round-Trip Time (or RTT) is a measure of total network latency. Although every TCP stack maintains an internal estimate of RTT, the ExtraHop system can measure RTT by inspecting traffic flowing over the wire as well. One ExtraHop customer uses comparative RTT measurements to detect subtle network architecture problems: a problem with the Spanning-Tree Protocol, for example, caused half their traffic to dogleg through a single, under-provisioned switch, introducing an additional 120ms of latency along the way. The problem went undetected for months until they began measuring RTT across hosts!

Retransmission timeout and round-trip time metrics can reveal subtle network performance issues.

Monitoring Internal Server Performance with TCP Metrics

feelsinternal to the server itselfAbortsThrottlingZero Windows.

A TCP connection is aborted when it is neither explicitly closed nor implicitly timed out: it is simply abandoned midstream. Since an Abort occurs at Layer 4, comfortably removed from the underlying network, it cannot be the result of faulty network hardware. Rather, an Abort is a direct message from application code: "I give up." Abort counts usually correlate with either application errors or intermediary proxies meddling in the end-to-end connection. The ExtraHop system also analyzes other Application-layer protocols so you can see the related HTTP status codes, storage errors, and SQL errors.

Throttling is a side effect of flow control in TCP. (For an overview of TCP flow control, see this relevant Wikipedia article.) Essentially, TCP provides a signaling mechanism by which the receiver can indicate its readiness for more data. If the receiver is busy processing the data it has already received—if a buffer is full, for example, or a database is writing to disk—it will signal the sender to slow the data transfer, giving the receiver time to catch up. Very considerate!

Although Throttling, like Retransmission, is normal, excessive throttling indicates that something is amiss on the receiver side. It could be starved for computing resources or it might be dependent on a third service that is bottlenecking the chain. In the latter case, we can inspect that third dependency to diagnose the root cause of the slowdown.

Zero Windows are the extreme case of TCP Throttling. A Zero Window message signals that the receiver is completely overwhelmed and the sender should send no data until further notice. Application delivery grinds to a complete standstill. The receiver may eventually recover or, in the worst case, the connection will be aborted and the transaction will fail completely.

Zero Windows have myriad causes, but they are all internal to the receiving host—they are not the result of bad networks. We recently helped a customer who fingered Zero Windows as related to the cause of a revenue-impacting outage: when Zero Windows went up, their website went down. Zero Windows are never the source of a problem in and of themselves. Like the rest of our TCP metrics, they are TCP's response to host-level problems. In this customer's case, we helped them trace the Zero Windows back to an application architecture that guaranteed database deadlocks. Even better, their database profiling software failed to catch the deadlocks. TCP analysis to the rescue!

Pairing TCP Analysis with Application Protocol Analysis

Want to learn more about protocols that ExtraHop can monitor and decode, including DNS, SIP, HTTP, FTP, memcache, and any TCP or UDP-based protocol? Check out our Protocol Support Page

Discover more

TechAnalysisGood ReadsGood ReadsTCP Tuning

Eric Thomas

Director of Cloud Product Marketing

Eric Thomas serves as Director of Cloud Products for IT analytics company ExtraHop. Prior to taking this role, Eric led the ExtraHop professional services team, and draws on over 20 years of experience in IT operations.

Before joining ExtraHop, Eric performed a variety of operational roles, most recently as director of advanced engineering for Thomson Reuters, where he led a team of performance and availability specialists, supporting over 200 applications representing $2B in annual revenue. His prior experience includes enterprise IT management, SaaS production operations, and next-generation technology advocacy.

Explore related articles

TCP vs. HTTP

December 13, 2018

What's the difference between TCP and HTTP? How do they both work, and how do they work together? Read the blog for definitions of both protocols as well as a breakdown of what makes them different.

Tips and Hacks

Read article

Best Practices for TCP Optimization in 2019

October 24, 2016

How to optimize TCP with TCP_NODELAY, Nagle's Algorithm, QUICKACK, and more settings to get better TCP performance on your network.

TCP TuningTips and HacksTech

Read article

TCP: Where the Network Meets the Application, Part 1

August 9, 2012

While we at ExtraHop love the higher-order application protocols like HTTP, the SQL family, and memcache, we are fascinated by TCP. Obsessed, even.

TechAnalysisGood ReadsWire Data Analytics

Read article

Experience RevealX NDR for Yourself

Schedule a demo

What is NDR

RevealX Platform

Integrations

NPM Resources

Capabilities

Ransomware Attacks

Advanced Threat Hunting

Threat Detection and Response

Network Forensics and Investigation

Security Hygiene

Cloud Workload Security

Operational Resilience

Troubleshooting and Resolution

Cloud Migration

Cloud Workload Monitoring

Network Forensics

Zero Trust

Multicloud & Hybrid Cloud Security

XDR Strategy

SOC Modernization

Digital Transformation

Financial Services

Education

Public Sector

Federal Civilian Agencies

Defense and Intelligence

State and Local Government

View Now

Careers

About

Press Releases

News

Leadership Team

Industry Recognition

Technology Partners

Channel Partners

Managed Service Providers

Apply Today

Sign In

Service Credits

Resident Experts

Implementation Services

Customer Community

Technical Support

Education Services

Virtual

In Person

View & Register

View & Register

Customer Stories

Reports

Demos & Videos

Webinars

Briefs

At-a-glance

Papers & E-books

Datasheets

Attack Types

Network Protocols

Blog

News & Articles

Overview

What is NDR

RevealX Platform

Integrations

Overview

NPM Resources

Capabilities

Overview

Overview

Overview

Ransomware Attacks

Advanced Threat Hunting

Threat Detection and Response

Network Forensics and Investigation

Security Hygiene

Cloud Workload Security

Overview

Operational Resilience

Troubleshooting and Resolution