A couple of days ago, I wrote an article about congestion control on the internet. In the comments, I promised a more technical exploration of the subject, and this is it.

Congestion is a problem on the internet. It will always be a problem on the internet, just as it is on our roadways. Yes, bandwidth keeps increasing. So does the number of users, and the number of network applications per user, and the required bandwidth per application. Supply will never stay ahead of demand for long, if at all. Sooner or later, it will become logistically or economically infeasible for service providers to add more bandwidth all throughout their networks. Sooner or later, enough people will be sending enough traffic over the network’s weakest links that it won’t matter if other parts of the network are overbuilt. The “more bandwidth” crowd will lose to those who recognized the need for congestion control and did something about it.

When thinking about how to handle congestion, it’s possible to look at things from many different perspectives. Here are some examples:

  • Different kinds/definitions of fairness.
  • The core of the network vs. the edge.
  • Distinguishing flows vs. hosts vs. “economic entities” (i.e. users).

To keep this from turning into a major dissertation, I’m going to say little about these.

  • On fairness, I think Briscoe has really said all that needs to be said about flow-rate fairness vs. cost fairness. Note that “flow-rate fairness” does not necessarily refer to flows in the sense that they’re often discussed in the congestion-control literature. It refers to the rate at which things (packets) flow, whether they’re associated with connections or hosts or anything else. It’s an unfortunate re-use of terminology, but that’s life. I think cost fairness is the proper goal/standard for congestion control. If you disagree, keep it to yourself. I might discuss things in terms of cost fairness, but for the most part the concerns I’ll address are directly translatable to a flow-rate-fairness world and I don’t want to get bogged down trying to make the flow-rate zealots happy.
  • As for the core vs. the edge, I bring it up because they’re very different environments calling for very different kinds of congestion management. While routers at the edge can reasonably hope to make flow/host/user distinctions (modulo the mess that NAT introduces), routers at the core have no such hope. They must also process packets at a much higher rate, so the most they can do is make hopefully-useful guesses about what packets to drop when they’re overloaded. The dumbest approach is simply to drop the last packets to arrive. The various flavors of RED (Random Early Detect) are slightly better, and I believe Stochastic Fair Blue (which I’ve written about before) is even better, but they’re all still basically guesses. While congestion control in the core is a fascinating subject, it’s not what I’ll be writing about here.
  • On flows vs. hosts vs. users, most of the people weighing in on these issues have tended to focus on users and I’ll follow suit. NAT makes a hopeless muddle of the whole thing anyway, so the only thing you can really be sure of is one entity paying the bill for one physical line. There’s a lot of merit to the argument that if there are multiple uses huddled behind NAT then that’s their problem anyway.

OK, enough stage-setting. On with the show.

Detection and Notification

Congestion control actually involves two separable problems: identifying congestion and then responding to it. Resolving congestion ultimately means getting senders to send less, so we’ll look at it from the perspective of the sender. How does a sender know that its packets have been dropped due to congestion? The worst way is to infer that something was dropped because an expected reply did not come back within an expected interval. The reply might not come back for reasons unrelated to congestion, of course, and the interval might have been miscalculated, but nonetheless this method has dominated the internet until quite recently. The slightly less bad way to detect congestion is to piggyback explicit congestion notification (ECN) onto replies. This is at least quicker than waiting for timeouts, and avoids some of the problems I just mentioned, but it’s still not exactly foolproof. For example, the information provided by ECN is interpreted by the endpoint, which means a user’s computer. Unfortunately, the response to ECN is only defined for TCP (not UDP) and only takes effect at the connection (rather than host) level. As Briscoe points out, applications that use many connections can and do often respond by simply shifting their work to other connections that may reintroduce congestion either at the same place as before or elsewhere. That’s not helpful.

The fact that ECN only operates at the connection level brings us to another important question: how should congestion discovered on one connection affect others? On the one hand, if other connections go to the same place (or nearby) they will be subject to the same congestion and information should be “shared” between them. On the other hand, it would be a mistake to throttle the entire user/application in response to ECN on one connection, because other connections may go to completely different places through completely different routes. Basically, the more accurately you can pinpoint the location of the congestion, the better you can determine which of your connections is affected. At one extreme you could attribute congestion only to one host; at the other you could indict “the internet” in general. An interesting in-between possibility that might be worth exploring would be to indict the AS (“autonomous system” = the entity of interest in BGP routing) or the route one used to an AS, because those are readily discoverable and likely to correlate with the extent of the congestion.

The main problem with ECN, though, is that the network shouldn’t rely on users’ machines to manage congestion. Even if those machines don’t simply throw away ECN information or try to “game the system” in other ways (which many provably do), even if they try to be well-behaved and implement tolerable solutions to the problems mentioned above, they’re just too many and too uncoordinated, especially since they’re fundamentally in competition with one another. The network entities best situated to bear most of the congestion-control load are the “edge routers” – the ones owned by ISPs that interface between users on one side and the ISP’s own network (thence the rest of the internet) on the other. They should also be looking for congestion indicators, responding to congestion on their own if the endpoints don’t seem to be doing so properly. As a first step, they should also be paying attention to ECN flags. To handle the UDP and other cases where ECN as it currently exists doesn’t work, they should also be participating in other mechanisms – invisibly to users, requiring no effort on their part – to feed back congestion information. Imagine, for example, that a certain destination ISP/AS is experiencing congestion and can identify a host within a particular source ISP/AS as a significant contributor. It would be extremely useful if information could get back to the host’s corresponding edge router which could then ensure that appropriate action is taken.

Response

What does “appropriate action” mean? Before I answer that, I’d like to give an example of what it doesn’t mean. Imagine you’re an edge router connected to two subscribers A and B on one side, and to two remote locations X and Y on the other. Subscriber A is busy sending to X, whereas B is mostly idle but occasionally sends a burst to Y. Now you start finding out that there’s severe congestion near X, and almost all of A’s packets are being dropped. How do you respond? According to “fair queuing” it would be by accepting A’s and B’s packets at the same rate, cutting B’s burst bandwidth even though A’s packets are almost certainly a waste of everyone’s time (or worse, since they continue contributing to the congestion). That hardly seems fair to me. Instead of trying to equalize the packet-acceptance rate, how about trying to equalize the actual-delivery rate? As long as A keeps sending you packets that are likely to be dropped anyway, why not drop them early (saving everyone else the effort of passing them on before they get dumped) and use the savings to let B send at his full burst rate? If A actually responds to the congestion by throttling back his sends to the congested area (and perhaps sending more elsewhere instead) then the congestion indicators will stop coming in and we’ll be back to status quo ante.

Note that none of this ever matters unless you’re dropping packets due to local capacity constraints anyway. It’s a way of apportioning the overload effects, not of reducing apparent capacity while actual capacity remains. To be more precise, let’s continue the you-as-edge-router example from above. Let’s say that during B’s bursts you’re getting 1000 packets per second evenly split between the two, and you’re dropping 20% due to simple lack of buffer space. At the same time, you know that 80% of A’s packets are being dropped further on due to congestion, and none of B’s are. “Fair queuing” would require you to ignore that last fact, allocating equal-length queues to A and B. You’ll effectively be dropping 20% of A’s packets and 20% of B’s packets, which is pretty much exactly what would have happened with a single queue so you’ve achieved exactly nothing. B’s actual packet-delivery rate is still 400/second when it could have been 500, just so you can pass more of A’s traffic on to where it’s being thrown away. This kind of fairness can work when the congestion is local and link flow control prevents packets from getting off queues at all, but in the real world of the internet the drops might be happening very far away and it’s pretty useless in that case. It addresses the wrong problem.

For contrast, let’s see what happens when we add some cost fairness. We know that we need to drop 20% of the packets sent to us overall, and we’ll apportion those drops according to the downstream drop rates. In this example, 100% of our downstream drops are for A, so they’ll get 100% of our local drops. In other words, we’ll drop 20% of their packets (either by manipulating queue lengths or by other means) and none of B’s. B is now getting his full burst rate delivered, an improvement of 100pps, in return for which A is losing 20pps. It’s a net win for our own users collectively, plus the reduction in A’s packets to X might help to alleviate the congestion around X so it’s a win for the broader internet collective as well. We can actually do even better by sorting packets more carefully (e.g. according to source/destination pair instead of just source) and allocating drop rates in a more sophisticated way, but we’re already ahead of the game.

Conclusion

So, what have we learned? First, I hope I’ve shown that getting congestion information from where it is known to where it needs to be known is an important part of the problem, and one that is not very well solved by current approaches. Some additional mechanisms, probably involving “out of band” communication, could help. Second, even if the information problem is addressed, “fair queuing” as usually proposed won’t do anything very useful with that information. I’ve proposed one scheme, or at least the very germ of one, that I think will do better. There are probably other good schemes as well, some similar and some quite different but none of them based on faux fairness. For the sake of users and applications that are doing their best to cope with congestion instead of making it worse by circumventing or perverting the congestion-control mechanisms already in place, we should start exploring and moving toward such schemes.