Sunday, 13 December 2009

Wow! latency

I was asked to look at an article on improving interactive game play on World of Warcraft.

On reading it - it immediately set of bullshit alarms. It has a number of claims that seem confused and in some cases plain wrong. But some packet dumping shows some grain of truth in the claims.

The main claim is that TCP will send a packet and wait for an ACK before sending the next packet and that Windows delays sending the ACK so causing the next packet to be delayed and things to be slow. This is not in general how TCP works. TCP uses a sliding window. It can send many packets before expecting an ACK. Sending an ACK less often is fine and covers all the packets so far. Delaying ACKs is normally very sensible.

Their claimed fix is a setting change on Windows to send ACKs with no delay (or less delay).

They claim that the time taken to send the ACK is how latency is measured by WoW. They don't back up this claim any my guess is it will be an application level keep alive (or ping) type message that is used to measure the latency. I can however see a mechanism for ACK delay to cause such a measure to be increased (see below).

They claim that this is nothing to do with the Nagle algorithm as that is turned of by Blizzard. I think they are saying it is turned off at the Windows end in the WoW client. This may or may not be true.

They claim that sending ACKs immediately is not sending more data. They say Windows sends 2 ACKs per two segments, and with their fix you send one ACK per one segment, so just as many. This is plain wrong - an ACK will (in one ACK) acknowledge all data up to that point, so one ACK acknowledges two or more segments in that scenario. So what they propose definitely sends more ACKs.

So lets get to the bit where their claims make sense... I did some packet dumps of WoW traffic on Windows (before and after applying the fix) and also on linux (wine). What I saw was :-

1. When sending a constant high volume stream of data the server will send multiple (full size) packets without waiting for an ACK. The ACKs are sent every few packets received and the data flows smoothly as TCP intended.

2. When in normal game plan after loading characters from a new location, etc, the data is different. The server has a steady stream of small blocks of data of the order of 40 bytes. I saw that Windows was delaying ACKs around 200ms, and is seems that the server does run the Nagle algorithm.

3. Linux was working on maybe 1ms ACK delay :-)

This is where it gets interesting. The Nagle algorithm means you do not send more data until you get an ACK from the previous data (sort of what they claimed), but only applies if what you have to send is less than one whole packet. This is, again, sensible networking practice.

However, if you combine this with a steady stream of tiny blocks of data and a 200ms delayed ACK, what you get is a packet sent, a delay of 200ms, an ACK, then a packet sent (multiple small blocks in one packet), and a delay of 200ms, an ACK, and so on.

I.e. the small blocks of data get bunched up in 200ms parcels, each of which may have dozens of small blocks of data.

Importantly, if one of those small blocks of data is a ping type request, it could be delayed up to 200ms before the client sees it. When the client has data to send in won't delay so the ping reply would be instant but the overall round trip time from the application level could be delayed up to 200ms. Similarly events in the game could be delayed up to 200ms.

So it is not that your request to hit something is delayed, it would not be, but the appearance of the thing you want to hit in the first place may be delayed by 200ms, which is not good.

Whilst I hear gamers argue over 1ms (which is silly), 200ms is long enough to make a difference!

So, what if you lower ACK delay? The answer is each small packet will get an ACK. This could mean sending dozens more ACKs (yes, increasing traffic sent). But it means massively reduced interactive delay. The ADSL line delay will mean some messages clumped together still because of the Nagle algorithm but there is nothing you can do about that unless Blizzard turn it off. They may have good reasons not to, as it would greatly increase traffic due to packet headers. Bear in mind, at 40 bytes of data at least half the packet is headers!

Will this improve game play? Maybe. The delay before was a jitter. It was not a fixed 200ms but random between 0ms and 200ms. Interacting would cause ACKs to be sent as part of normal packets being sent from the client. It is quite possible that in a battle there is enough traffic both ways not to make any real difference to the interactive latency. It is hard to say, and not easy to analyse without a dump of the application layer at Blizzard to compare. Basically it may help!

One thing it does do is make the reported latency a lot lower and that probably has a big placebo effect on gamers :-)


  1. There's a fair amount of misinformation put out on gaming sites, as the people that write them don't really understand networking.. they just did some stuff and it 'worked'.

    I've lost count of the number of sites that list ports used by a game/console and don't seem to understand there's a difference between incoming and outgoing connections.. leading to people port forwarding port 80 to their PS3 for example.

  2. One place it does seem to help with, and of course this may just be a placebo, but while I was in Scotland last week with an unreliable wifi link, it did see to make WoW more stable.

    I guess as there was greater chance of the ack actually getting through the packet loss