If you want to really understand TCP optimization techniques, how to decide which to use, and how to implement them, you have come to the right place. This post was so long and rich we decided to split it into sections and give it a table of contents. Enjoy!
Table of Contents:
- TCP background information: Why Nagle's Algorithm and Delayed ACK were implemented and how they interact
- Nagle's algorithm and Delayed ACK do not play well together in a TCP/IP network
- What are TCP_NODELAY and TCP_QUICKACK, and what do they do?
- Resources on Delayed ACK, Nagle Delays, Tinygrams, Silly Window Syndrome, and other TCP issues
- Should I enable TCP_NODELAY?
- How do I figure out if I should enable TCP_NODELAY?
- How do I know if TCP_NODELAY is helping?
- How can I resolve the issues caused by Nagle's algorithm and Delayed ACKs?
1. TCP Background Information: Why Nagle's Algorithm and Delayed ACK Were Implemented, and How They Interact
Today's internet is a large and global TCP/IP network that sends web pages and huge files of all types across great distances. A lot has changed since the internet was initially built, when small academic and government networks largely used the Telnet and Network Control Program (NCP) protocols. The internet has grown exponentially since its inception, and as more types of traffic, devices, and protocols have come online, the importance of managing this traffic efficiently has grown as well.
When the TCP/IP stack took over as the dominant protocol in the early 1980s, leaving Telnet to more specialized purposes, there were finally settings available to optimize traffic flow and avoid congestion and data loss. Even now, though, it can be difficult to know when and how to use these settings. This article will make clear some of the best use cases for common TCP optimization settings and techniques, specifically Nagle's Algorithm, TCP_NODELAY, Delayed ACK and TCP_QUICKACK.
Nagle's algorithm, named after its creator John Nagle, is one mechanism for improving TCP efficiency by reducing the number of small packets sent over the network. The goal was to prevent a node from transmitting many small packets if the application delivers data to the socket rather slowly. If a process is causing many small packets to be transmitted, it may be creating undue network congestion. This is especially true if the payload of a packet is smaller than the TCP header data.
This is analogous to loading one dresser into a huge moving truck and then driving across town. Unless the dresser needs to get there immediately, you might as well wait and fill the truck up. That's what Nagle's algorithm does. Nagle's algorithm is used to optimize the data transfer by consolidating multiple small request bytes into a single TCP segment so that the ratio of header data to payload is more efficient. TCP headers take up 40 bytes, and there are plenty of applications that can emit a single byte of payload. If your environment is configured to send data immediately, you could end up sending a 41 byte packet with only one byte of actual payload.
TCP delayed acknowledgment or Delayed ACK is another technique used by some implementations of the TCP in an effort to improve network performance and reduce congestion. Delayed ACK was invented to reduce the number of ACKs required to acknowledge the segments and reduce the protocol overhead. Delayed ACK is the destination retaining the ACK segment for the value of the delayed ACK timer, about 200 - 500 ms. Delayed ACK means TCP doesn't immediately acknowledge every single received TCP segment. Several ACK responses may be combined together into a single response, reducing protocol overhead. Delayed ACK is basically a bet taken by the destination betting 200 - 500 ms, that a new packet will arrive before the delayed ACK timer expires. Though in some circumstances, the technique can cause a reduction in the application performance.
It is important to understand the performance impact on your applications when you're deciding which TCP optimization methods to implement.
Nagle's Algorithm and Delayed ACK were created around the same time, but due to lack of collaboration between the creators, they provided an incomplete and sometimes conflicting solution. John Nagle himself expressed frustration about the situation in a Hacker News thread on the topic, saying:
"That still irks me. The real problem is not tinygram prevention. It's ACK delays, and that stupid fixed timer. They both went into TCP around the same time, but independently. I did tinygram prevention (the Nagle algorithm) and Berkeley did delayed ACKs, both in the early 1980s. The combination of the two is awful."
2. Nagle's Algorithm and Delayed ACK Do Not Play Well Together in a TCP/IP Network
By default, Nagle's algorithm and Delayed ACK are broadly implemented across networks, including the internet. Nagle's algorithm effectively only allows one packet to be actively transporting on the network at any given time, this tends to hold back traffic due to the interactions between the Nagle's algorithm and delayed ACKs. Hence Nagle's algorithm is undesirable in highly interactive environments.
For example: Delayed ACK tries to send more data per segment if it can. But part of Nagle's algorithm depends on an ACK to send data. Nagle's algorithm and Delayed ACKs together create a problem because Delayed ACKs are waiting around to send the ACK while Nagle's is waiting around to receive the ACK! This creates random stalls of 200-500ms on segments that could otherwise be sent immediately and delivered to the receive-side stack and apps above it.
In situations where you need your data to be transmitted immediately and one-way latency matters, such as when transmitting user interactions like keypresses or mouse movements from a client to a central server using Telnet, turning off the Nagle algorithm can make for a better user experience. But for almost everything else where only round trip time matters, not one-way, then turning off the Nagle algorithm may not help.
Delayed ACKs can help in certain circumstances, such as when using the character echo option in Telnet. If the ACKs are tiny and don't use much bandwidth then Delayed ACK is not of much help. These intricacies make it tough to tell when to use Nagle's algorithm, Delayed ACK, and other TCP optimization options.
There's nothing in TCP to automatically turn Nagle's algorithm or Delayed ACKs off, so you have to understand your network well enough to choose the options that will provide the best performance.
3. What Are TCP_NODELAY and TCP_QUICKACK, And What Do They Do?
It is very important to understand the interactions between Nagle's algorithm and Delayed ACKs. The TCP_NODELAY socket option allows your network to bypass Nagle Delays by disabling Nagle's algorithm, and sending the data as soon as it's available. Enabling TCP_NODELAY forces a socket to send the data in its buffer, whatever the packet size. To disable Nagle's buffering algorithm, use the TCP_NODELAY socket option. To disable Delayed ACKs, use the TCP_QUICKACK socket option.
Enabling the TCP_NODELAY option turns Nagle's algorithm off. In the case of interactive applications or chatty protocols with a lot of handshakes such as SSL, Citrix and Telnet, Nagle's algorithm can cause a drop in performance, whereas enabling TCP_NODELAY can improve the performance.
In any request-response application protocols where request data can be larger than a packet, this can artificially impose a few hundred milliseconds latency between the requester and the responder, even if the requester has properly buffered the request data. Nagle's algorithm should be disabled by enabling TCP_NODELAY by the requester in this case. If the response data can be larger than a packet, the responder should also disable Nagle's algorithm by enabling TCP_NODELAY so the requester can promptly receive the whole response.
4. More Resources on Nagle Delays, Delayed ACK, Tinygrams, and Silly Window Syndrome
- To Nagle or Not to Nagle, That is the Question
- Nagle delays explained
- What Is a Tinygram?
- Tinygrams explained
- What is Silly Window Syndrome
- TCP profile setting for the BIG-IP
5. Should I Enable TCP_NODELAY?
It really depends on what is your specific workload and dominant traffic patterns on a service. Typically Local Area Networks (LANs) have less issues with traffic congestion as compared to the Wide Area Networks (WANs).
If you are dealing with non-interactive type traffic or bulk transfers such as SOAP, XMLRPC, HTTP/web traffic then enabling TCP_NODELAY to disable Nagle's algorithm is unnecessary.
Some contexts where Nagle's algorithm won't help and TCP_NODELAY should be enabled are:
- Highly interactive applications that communicate with a central server (Citrix, networked video games, etc)
- Telnet-connected devices Applications using chatty protocols (Telnet, SSL)
6. How Do I Figure Out if I Should Enable TCP_NODELAY?
There is no simple rule of thumb as this is very dependent on your traffic patterns and application mix, but here's a good test you can do if you have ExtraHop. Leave the ExtraHop Discover appliance running to get some baseline data, then look at the TCP stat under your key switches. Are you seeing a high number of "tinygrams" (packets that contain a relatively small payload compared to the overhead associated with the headers required to transfer the data.)
If you see lots of tinygrams or a high number of Nagle Delays as a percentage of overall traffic, then disable TCP_NODELAY that will allow Nagle's algorithm to reduce the tinygrams. Again leave the EDA running for some time and then look at the tinygram number, if this number is still very high then enable TCP_NODELAY, indicating Nagle's algorithm is not reducing the tinygrams.
Tuning tends to be an iterative process. It takes a some experimentation to know if you should or should not enable TCP_NODELAY, and your needs will change over time as your networking stack and applications grow and change.
7. How do I know if TCP_NODELAY is helping?
After enabling TCP_NODELAY to disable Nagle's algorithm and going through the process of tuning, if you see a very low number of Nagle Delays as a percentage of overall traffic and a very low number of tinygrams then you know enabling TCP_NODELAY is helping.
Conversely if you see a high number of Nagle Delays as a percentage of overall traffic and a very high number of tinygrams then enabling TCP_NODELAY probably is not the best fit for your use case.
8. How can I resolve the issues caused by Nagle's algorithm and Delayed ACKs?
If you have been through the tuning process and are still seeing network congestion issues, you may have problems that can't be solved by tweaking your socket settings. However, there are a few more things to try before giving up:
- Enable TCP_NODELAY to disable Nagle's algorithm via global socket options on the servers
- Make profile tweaks on proxy servers and Load Balancers: This is especially relevant if you're running applications or environments that only sometimes have highly interactive traffic and chatty protocols. By dynamically switching Nagle's Algorithm and TCP_NODELAY on and off at the load balancer level, you can keep even highly heterogeneous traffic mixes running optimally.
- Reduce the Delayed ACK timer on your servers and load balancers. Sometimes, this kind of optimization is handled in software, at the application level, but when that's not the case, you may still be able to dynamically manage the ACK timer at the server or load balancer level.
As you're making these changes, keep careful watch on your network traffic and see how each tweak impacts congestion.
At ExtraHop, we get to take a detailed look at plenty of enormous corporate networks, and you'd be surprised how often a major company has purchased hundreds of thousands of dollars in additional network gear unnecessarily because their core protocols, the TCP/IP stack, weren't optimized for their application traffic mix. It really pays to try optimizing your current environment before throwing more hardware at the problem.