The first step in upgrading our network is replacing some of the core switches with new, much faster and more powerful, switches.
Replacing switches is always fun!
For a start, they are in pairs to try and ensure continued operation of at least some of the network if one was to fail. Where possible devices are connected to both switches, and where we have pools of devices they are spread between the two. We actually have some new changes in the pipeline that will allow more of our equipment to actually use link aggregation over two switches for better redundancy even.
So, to move to a new switch, what do you do?
Well, first off, and surprisingly, you have to make space - you need the new switches basically next to the old ones in the rack. This may not be obvious, but if you are moving cables from one switch to another you need to make the move as short as possible. If not, then you have to re-route the cables or even get longer cables. So you have to shuffle stuff up/down to make space. Thankful that worked well. You also have to check cables are going to be able to move, and none are too short or snagged on anything.
Then, you make sure the new switch is the same config as the old. This is not simple as switch configuration is far from standard. There are VLANs and jumbo frames and all sorts to check very carefully. A lot of double checking is needed.
You also configure the old and new switch so that all of the VLANs can link between them. This means you can plug the new switches in to the old ones.
Then, on the day, you move one cable at a time. Ideally, shutting down operations of what you are moving cleanly to fall back to other devices, and then move the cable, check it, re-enable the functions, and check that. One by one very carefully. Done right you can move a lot of things with no impact on service at all - pairs of BGP servers can cleanly switch over, move, and switch back. Some things have disruption like LNSs which cause traffic to reconnect to other LNSs when shut down.
There can be (and were) problems! Basically the old switches had a head fit after moving many of the cables! This makes no sense, and meant power cycling the damn things. And, of course, moving cables back. It was not pretty.
We have tried this twice, and the second time we have Talk Talk suffer a major issue as well which complicated matters so even reverting the changes left us with all TT lines off line for a couple of hours.
So, this time, on Thursday, new approach, called "big bang". The same careful config, and checking, but not linking the old switch, just carefully but quickly moving every cable to the new switch and then spending time checking each one. It will cause more issues than the more usual step by step approach (when it works), but it is pretty predictable that it should actually work this time. However, there will be a clear time limit and move all the cables back if we cannot get everything working within that time, in the middle of the night.
Good luck to the ops team doing this work...