RevK®'s ramblings: What is Packet Loss?

2014-02-08

What is Packet Loss?

The Internet uses a system of packets to send information. This means that whatever you are doing, whether accessing FaceBook, making a Skype call, playing an on-line game, downloading a file or reading an email, the information is broken down in to packets. These are not always the same size, and are typically up to around 1500 bytes (or characters) of data at a time.

Each of these packets carries some addressing information, and some data. The fact that packets are used means it is possible to have lots of things happening at once, with bits of one thing in one packet followed by bits of something else in another packet and so on, mixing up multiple things on one Internet connection. This is how it is possible for lots of people to use an Internet connection at once. The addressing data in the packet makes sure the right things go to the right place and are put back together at the far end.

This is all very different to old fashioned phone calls which work on circuits. They work by creating a means to send data (e.g. voice) continuously at a specific speed between two points, reserving the capacity for that link for the duration of the call. You either manage to establish the call (the circuit), or not, at the start. Once you have it, you have the circuit in place until you finish. It is a very different way of working to packets.

One of the problems you get is where a link of some sort gets full.

With a circuit based system like phone calls a full link (i.e. one already carrying as many calls as it can) will mean you get an equipment engaged tone. The call fails to start.

However, with a packet based system, when a link gets full you start with a queue of packets waiting to go down the link (adding delay) and ultimately you drop packets. That means the packets are thrown away. This can, and does, happen at any bottleneck anywhere in the Internet. The most likely being where the Internet connects to your Internet connection and create a bottleneck.

So packet loss is normal. It is what happens when a link is full.

The result of this packet loss depends on the protocol. The overall effect on any sort of data transfer, such as downloading a file, or sending an email, is that the transfer happens at a slower speed. The end points send packets of data at a slower speed so that they don't get dropped packets. Importantly, with a lot of protocols, the missed packets are re-sent which means the data does not have gaps in it.

Some protocols do not allow resending or slowing down, these include things like VoIP calls, like Skype, where you can't slow down a phone call. What happens in such cases is you get gaps in the call - break-up, pops, etc.

Some systems are clever and decide which packets to drop when a link is full, giving protocols like VoIP a chance to get through and dropping packets for protocols that can back-off if needed. We do this in A&A, for example.

However, there is another scenario where you can get packet loss, and this is where there is a fault. In the case of a fault you will find some packets are dropped at random. What usually happens is some of the data in the packet is corrupted (changed) by random noise or errors from the fault, and this means that the packet no longer checks out when it gets to the other end. Packets have built in checks to confirm nothing was changed, and if that check fails the packet is dropped.

The effect of fault based packet loss depends on the protocol.

For protocols like VoIP, the dropped packet simply means break up in the call. Even low level of packet loss can mean annoying pops and gaps in the call.

For protocols that can back off and slow down, well, that is what they do. They cannot tell that the packet loss is the result of a fault and not of a full link, so they slow down. But even when the slow down, they still get packet loss as it is random. So they slow down even more. They don't understand the problem, and just assume that a link must be getting full no matter how slow they go.

Imagine if driving a car with no speedo but you get a light saying "driving too fast". That is fine, when you see the light, you slow down, and you stop seeing the light. That means you drive at the right speed. But if the light is faulty and keeps saying "driving too fast" at random, you will slow down, and still see the light, so slow down more, and before you know it you are crawling along at walking speed.

This means that even low levels of random packet loss can massively slow down a data transfers.

Packet loss when a link is otherwise idle is a fault.

The problem is that when you measure packet loss you do not always know if the link is full or not. Your tests of packet loss, usually a protocol called ping, could be losing packets because a link is full sending an email, or it could be losing packets because of a fault.

The key is to measure packet loss when a link is otherwise empty of traffic, so that the only reason to drop packets is because of a fault.

The other problem with measuring loss is how you measure it. The normal measure is percentage loss. If you send 100 packets, how many arrive and how many are lost. This is fine, but random corruption causing loss will have a much higher chance of causing a packet to be lost if the packet is bigger. So you have to look at packet loss and packet size. From this you can work out a rate of corruptions on a link and predict the loss for other packet sizes.

The best measure of loss as a simple percentage is the loss when sending full size packets (1500 bytes) which is what the data transfer protocols (like TCP) use. Even a 1% or 2% of loss of such packets can cause TCP to slow down massively. It does not work like taking away a couple of percent of speed - the data transfers keep slowing down as they keep thinking the line must be full.

2% loss is not like 98% working speed!

A simpler, and less intrusive measure of loss, is a simple short LCP echo. LCP echoes are a normal part of most Internet links, and A&A do them every second and record the loss for every line. This is only a few bytes, and so packet loss that is a fraction of a percentage could mean several percent at full packet sizes. This is why it is so important to take even very low levels of LCP echo loss seriously.

This is why packet loss needs to be a clear metric of quality and faults and why companies like BT need documented packet loss measures that are considered a fault. For some inexplicable reason such a simple metric is not part of any service level guarantee, and not considered a "fault" by BT!

Oddly, buying transit, which means sending and receiving packets from thousands of places all around the world (not just exchanges in the UK) and even laying cables under the ocean, one can get a service level guarantee of ZERO packet loss ever. This shows how seriously transit providers take such things. They even guarantee latency (the time taken to transfer packets). Even more oddly, such services are typically around a 50th of the cost of BTs connectivity to exchanges around the UK where no service level guarantee exists for packet loss. It is a strange world we live in some times isn't it?

12 comments:

AnonymousSaturday, 8 February 2014 at 22:51:00 GMT
I'd be wary of predicting packet loss of one size packet to another. This is because packet loss due to a bug can be packet size dependent, for two reasons: 1) there can be different code paths for different sized packets, and 2) there can be a different code path when you get near the end of a circular buffer, and the alignment of packet to buffer is affected by its size.
This isn't a theoretical comment: this kind of bug has been present in DLSAMs which you probably have had to work with.
Of course, that's not the only kind of bug. a DSL line failing to adapt when the noise level has increased will act in the way you describe.
ReplyDelete
Replies
ChrisSunday, 9 February 2014 at 12:41:00 GMT
Then there is packet loss due to deliberate action by the ISP, throttling certain connection types...(not suggesting AA do this!) this is a variant of "link full" but could certainly happen on an idle link.

Something I would add to your explanation is that the corrupted packet will not be delivered all the way to the end point as the checksum is verified at each router along the way.
It may be possible by playing with a variant of traceroute to see approximately which section of the link has the most traffic loss....
ReplyDelete
Replies
Cecil WardMonday, 10 February 2014 at 01:39:00 GMT
Could you ask someone to take a look at my constant packet loss? many thanks -cwcc@a
ReplyDelete
Replies
AnonymousTuesday, 25 February 2014 at 14:10:00 GMT
What do you use to issue the LCP requests? Is there a command line tool like ping which does similar?
ReplyDelete
Replies

Add comment

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

RevK^®'s ramblings

2014-02-08

What is Packet Loss?

12 comments:

PCB designs, Ethernet, and PoE

Rules

Rules

Report Abuse