NEW: Check out our TCP Optimization Guide: Nagle Delays and Beyond
Silly Window Syndrome (SWS) is a problem that can arise in poor implementations of the transmission control protocol (TCP) when the receiver is only able to accept a few bytes at a time or when the sender transmits data in small segments repeatedly. The resulting number of small packets, or tinygrams, on the network can lead to a significant reduction in network performance and can indicate an overloaded server or a sending application that is limiting throughput.
In TCP, as data is transmitted the receiver replies with acknowledgements that, among other values, specify a window size - the number of bytes it is currently able to receive. The sender uses this to compute a "usable window" by subtracting the amount of unacknowledged data from the window size provided by the receiver. This process is known as the sliding window algorithm that TCP uses as its flow control protocol.
In certain situations and without preventative measures in place, the sliding window protocol can lead to SWS when the usable window shrinks to a "silly" size and increasingly small segments are sent (for reasons discussed below), to the point where packet headers exceed the amount of data in the packets. The greater number of packets being sent, each with its own TCP header, dramatically increases the overhead even as the amount of actual data sent decreases, leading to network congestion and a large loss of efficiency from degraded throughput.
What Causes Silly Window Syndrome from the Sender Side?
On the sender's side, silly window syndrome can be caused by an application that only generates very small amounts of data to send at a time. Even if the receiver advertises a large window, the default behavior for TCP would be to send each individual small segment instead of buffering the data as it comes in and sending it in one larger segment.
A Common Solution
Nagle's algorithm is one of the most common ways of dealing with silly window syndrome, but the algorithm is still widely misunderstood and requires some tuning and optimization to make it work correctly in most environments. Here's what happens in a TCP transaction when you have Nagle's algorithm turned on:
- The first segment is sent regardless of size.
- Next, if the receiving window and the data to send are at least the maximum segment size (MSS), a full MSS segment is sent.
- Otherwise, if the sender is still waiting on the receiver to acknowledge previously sent data, the sender buffers its data until it receives an acknowledgement and then sends another segment. If there is no unacknowledged data, any available data is sent immediately.
While Nagle's algorithm increases bandwidth efficiency, it impacts latency by introducing a delay since only one segment is sent per round trip time. Applications that require data to be sent immediately usually require Nagle's algorithm to be turned off.
For a real-life analogy, let's say we have a couple of moving trucks (packets) taking furniture (data) from one house to another. If a truck transported each piece of furniture as soon as it was taken out of the old house, one piece at a time, clearly the operation would take forever (SWS). If we have enough trucks in transit between locations, there's going to be a fair bit of congestion on the route as well. The obvious and more efficient solution is, of course, to wait until each truck is full before it drives off to the new house to avoid the large overhead of the drive time and loading/unloading time of each truck's trip.
What Causes Silly Window Syndrome from the Receiver-Side?
If the receiver processes data slower than the sender transmits it, eventually the usable window becomes smaller than the maximum segment size (MSS) that the sender is allowed to send. However, since the sender wants to get its data to the receiver as quickly as possible, it immediately sends a smaller packet to match the usable window. As long as the receiver continues to consume data at a slower rate, the usable window, and therefore the transmitted segments, will get smaller and smaller.
There are some settings you can tweak to minimize the likelihood of silly window syndrome being caused on the receiver side:
- When the receiver's window size becomes too small, the receiver doesn't advertise its window until enough space opens up in its buffer for it to advertise a maximum-sized segment or until its buffer is at least half empty.
- Instead of sending acknowledgments that contain the updated receive window from above as soon as the window opens up, the sender can delay the acknowledgments. This reduces network congestion since TCP acknowledgments are cumulative, but the delay must be set low enough to avoid the sender timing out and retransmitting segments.
Going back to our example, let's say that once the moving trucks are unloaded, there's only one person moving the furniture into the new house. If he's moving too slowly, the furniture will pile up in front of the house and there won't be any space left for subsequent trucks to drop off their furniture. If his solution is to tell the truck drivers to start bringing fewer and fewer pieces of furniture each trip to give him a chance to keep up, he'll run into the same problem as before - a lot of trucks for only a few items traveling between houses. If he doesn't request additional furniture until he's moved more of it into the house, he can ask for a fully-loaded truck as soon as he has room instead of incrementally receiving the same amount of furniture, but in a larger number of trucks.
Silly window syndrome is an avoidable problem, but it happens, and when it does, it pays to know where to look for the cause, and what kind of troubleshooting you can do to make it better.