Saturday, 3 March 2012

Slicker PPP

I have spent all morning tweaking PPP.

It is not that there was a problem or anything, but as I was working on the whole area to get a DHCPv6 client in the FireBrick PPPoE side, I was looking at it.

I had not realised we were taking a few seconds to fully negotiate.

The problem was that we did not send the next packet immediately on a state change for every normal event - we waited for a one second poll to check for timeouts and/or next thing to send.

In practice this worked well, but introduced a couple of places where there was an extra delay of up to a second in the negotiation.

On our LNS side this did not happen as the authentication is proxied which eliminated one stall leaving only the IPv6CP negotiation. The result is IPv6 took up to a second longer to come up, which is probably why nobody noticed. After all I have customers that would have complained if PPP was not working as well as it could (some are involved in the pppd stack on linux).

Where it really was more noticeable was using the FB2500/FB2700 as a client, e.g. on FTTC. This had two or three places that would stall up to a second.

Well, the changes were mostly simple. I already handled the logic of a received PPP message means sensing one back immediately - either as a reply or the next message to send.

The startup on PPPoE completing was not immediately sending the first PPP packet, so that was one stall easily fixed.

I had missed one step where CHAP was accepted - I should immediately go on to start IPCP at that point, so that was another easy fix.

The less easy fix was when we sent an ACK, and that completed a stage. We do IPCP before IPV6CP (though we could do in parallel I guess) and when we sent IPCP ACK we did not also start IPV6CP. That meant the rx packet processing had to cope with two packets to send as a result instead of one. Again, not a big change.

The end result is that on the bench I have PPPoE and PPP negotiating all the way in under 50ms, and in fact all within a second of boot up.

Next test - FTTC line. How fast can we be on line from power up on a real connection. I'll add details when we have tested that :-)

Data we have so far :-

1. BE line using vigor modem, PPP kill from our end, loss of 2.5 seconds of pings at 0.1 second intervals.
2. More tweaking and above managed a PPP kill where we lost 1 second of pings from outside of our network.
3. A software reload of an FB2500 on a vigor on a 20CN line - 2 second downtime


  1. Don't worry about it RevK.

    Just draw the screen and the mouse and chill out.

    The fact that the system is bloody useless for several minutes after screen draw is irrelevant - it's the screen draw that stops the clock.



  2. Measured how long the Openreach modem takes to sync yet? If it takes more than a second from power on to ready to pass data, you're already fast enough that you're not the bottleneck in recovering from a power outage.

    Would also be interesting to know how long the Openreach ONTs take from power on to passing data - if an FB2500 can do PPPoE before the ONT is ready to pass data, you're golden.

  3. Well, we did a "PPP kill", which has a one second poll loop in the reconnect cycle still and that dropped 2.5 seconds of pings based on a 0.1 second interval, on a BE line (no resync). I bet I can do better.

  4. Be aware of unintended consequences of your improvements: can your authentication servers deal with one or two exchanges worth of clients recovering from a power outage within the same second?

    1. These changes do not affect that really - we have to handle thousands of connections within a few seconds already. And the PPP will wait for the authentication - the change is all down to what happens when all is well and replies correctly received.