2018-07-11

Very frustrating modem bug

I blogged in 2013 about a bug in BT supplied FTTC/VDSL modems (here). BT & manufacturer fixed that modem, but sadly the same bug seems to exist in other modems.

We're seeing this in some Zyxel modems now. I am pretty sure I have had reports of other modes as well. The problem is that it is not always obvious what the issue is, people turn things off and back on again, and that fixes it, so we do not always get clear reports of issues.

The issue is that the modem seems to have some sort of packet acceleration in the chipset - for what reason we cannot image - it seems that it caches around 254 different IP/port/protocol sets. This is a bit like header compression by the sound of it.

This means these cached headers are reconstructed, which is fine, if not for a bug. It seems that when passing PPPoE the IP/port/protocols are matched but the PPPoE ID field is not, yet this is part of what is reconstructed.

This means that on any fixed IP line (IP being an one of the things that needs not to have changed to hit the cache), a PPP restart leaves any packets matching the cache from the previous PPPoE session not working (they send down the line with the PPPoE ID of the previous session, and so are dropped).

For a lot of people this does not matter - dynamic IP has no issue, and even a lot of Internet traffic has different IP/ports, e.g. accessing web pages, so just work.

Sadly there are a number of protocols that easily break, including VPNs like IPsec, and VoIP. So a simple PPP restart on a line causes things not to work. The other issue is these protocols tend to keep trying. so the cache never clears or times out. It just stays not working.

Resetting the modem is one (slow) way to fix, but all it actually needs is the Ethernet port to be reset - just un-plug the cable and re-plug. This causes the modem to clear the cache (why do this on a port reset and not on seeing a PADI, I have no idea, but bugs are funny like that).

When this was one modem and the issue was fixed, that was fine. Now it seems to be more modems, and not fixed, it needs some work arounds. So today I have added an Ethernet port reset feature to our FireBricks so that the port connected to the modem is reset for a second when PPPoE shuts down.

It is a bodge to work around someone else's bug, but it is a pragmatic step. It is in the latest FireBrick alpha release for testing.

What fun?!?

P.S. Until now people have used the profile feature of the FireBrick to reset the port when needed because a VPN would not come up due to this modem bug. This is just how flexible the FireBrick is, but it was not easy to do for VoIP related issues, and so I felt it was time for some special code for this.

9 comments:

  1. I've not ran into this on the white BT modems (half a dozen in service) nor the BT Business Hub 5 or above (dozen + in service). All in bridge mode. Did BT fix their modems? I do have one Zyxel modem in service on Warwicknet; interestingly I have had on one occasion a fault where an IPSEC vpn died and would not reconnect, however a restart of libreswan fixed it so not clear if I hit this bug?

    ReplyDelete
    Replies
    1. AFAIk BT fixed theirs. A break in the usage for a while can cause the cache to be cleared / replaced, so stopping an IPsec link and restarting later may do it.

      Delete
    2. I do ponder how I could ensure the modem got its update though... Nowadays I'm buying these modems on eBay primarilly, new in box, bridging them and putting them into service. Presumably that means they are not getting updated? I dont see any option to perform manual software updates...

      Delete
  2. Looks like this was seen on Zyxel modems starting at least a few years back

    "PPPoE Session-ID caching bug (In Bridge mode)"
    https://support.aa.net.uk/VMG1312-B10A:_Bugs

    ReplyDelete
    Replies
    1. Yeh, we are seeing more cases definitely confirmed and different models as well now.

      Delete
    2. Anything that low-level is more likely to have been written by the chip vendor. At least when I worked on dsl modems a few years ago, the likes of zyxel and BT didn't do much more than the UI on CPEs.

      Delete
    3. Got hit by that one in a major way a couple of years back. I've not used a zyxel since - I trawl ebay for BT modems as required.

      Delete
  3. What do ZyXEL have to say about it?

    They still refuse to add support for RFC4638 despite it just being a couple lines of code that need changed (the exact same changes on most devices).

    https://github.com/Olipro/VMG1312-B10A/releases

    ReplyDelete
    Replies
    1. And here's the RFC4638 jumbo frames in bridge mode patch for the VMG3925-B10B:

      https://github.com/trejan/VMG3925-B10B

      YMMV but it works fine on my modems, a FB2900 (with the Ethernet port reset feature mentioned above enabled) and bonded AAISP Soho::1 lines.

      Delete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.