I am pleased to say that this mornings work seems to have been a total success. It did take a lot longer than expected with several "quirks" in the cisco switch as well as one in the FireBrick.
Several people have asked what we are doing, so I thought I would explain.
We used to have a simple set up of one LNS (L2TP Network Server) to one BT back-haul fibre. This worked well. As we expanded we increased the number of BT links and LNSs.
However, when we added BE and later Talk Talk back-haul we use the same LNSs. This is so we can do bonding between BT and TT lines. This worked pretty well.
The snag has come where we need to expand the number of LNSs we have, but we don't need more BT links (yet). Indeed, we may need more TT links soon. The link of one LNS to one BT link no longer works.
So this morning we undertook the biggest restructure of our broadband handling network for around 8 years.
The change is that each BT and TT link now has a separate BGP connection that is not the LNS. This then links over to a pool of LNSs to spread the load, and allow more LNSs. We have deployed three more FireBrick FB6202 LNSs this morning.
This does, however, present some issues. The big one being how we balance the load to and from BT.
For traffic to BT/TT, we are using ECMP (Equal Cost Multiple Path). This balances traffic, but does so using a hash of IPs and ports. The snag is that there may not be many IPs and ports involved in L2TP. The ports are 1701, and in some cases there are very few IPs. BT actually expose all of their hundreds of LACs to us, but TT expose four!
The traffic from BT/TT is not so easy - we can't expect BT or TT to use ECMP, so we have to balance some other way - and we do this by using specific LNS endpoints. The snag is that you want to share N LNS endpoints over M links somehow? It is not simple when they do not divide up nicely. So the answer is a set of IPs NxM, allowing steering of sessions to hit one of these with reasonably even distribution, and allowing both an even spread over the LNSs and at the same time and even spread over the links to the carrier.
This has the side effect of giving the hash used by ECMP more to play with.
The end result is a much cooler looking "weather map" of our network with a lot more LNSs in use. It also means we can go back to rolling over night updates of LNSs, which we expect to do over the next few weeks.
It allows more LNSs to be added without breaking things, and allows more links to either BT or TT as needed.
The next step is some fine tuning of our external links to transit and peering, but for now, we have the capacity we need to grow for some time.
This is hard work, and expensive kit, but it is well worth it.
Thank you for your patience.