I don't really have anything to rant about this week, sorry...
A lot is happening at A&A, and that is really quite exciting.
It is probably worth starting by saying more on the issues we had with our Cisco switches some weeks ago. We do not use much Cisco kit, and indeed, until a couple of years ago I was quite proud to say we had no Cisco kit at all - our routers and LNSs are all FireBrick. However, whilst FireBrick are working on 10Gb/s routers, we don't have any FireBrick switches. So we did get Cisco switches, which do a tiny bit of BGP to carriers. Apart from those blips, these have worked well. We employed Cisco trained staff to set them up in the first place and have other staff that had gone on Cisco courses, and we have engaged another expert Cisco engineer on some occasions, including in the post mortem of these issues. It still looks a lot like one main issue, that happened twice, and a different issue that happened when we rebooted one of the switches the second time. We can be sure the issue is somehow in the Cisco switches. It also seems unlikely that it could be spanning tree or anything else like that - we had all BGP sessions to carriers stop, and each of these is a simple direct BGP to a directly connected endpoint on a specific port on the switch in question. Failure of BGP in that case should not be possible, even if every other port was shut down, even if all inter switch links had failed, BGP on the directly connected port should stay working. The fact this happened on all of these links covering four separate switches suggests something "upset" all of the switches by some means, and we have failed to actually get to the bottom of it conclusively. We have, however, set up a lot more logging, and made a number of "defensive" config changes which could cater for possible causes, albeit clutching at straws. It does mean that if it happens again we will be in a far better position to diagnose properly and involve Cisco TAC, as we will have the logs needed. I appreciate this does not sound good, and to be frank, it is not. However, they are being very stable now, and we do have all the redundant links back in operation, and all seems well! I hope customers can appreciate that we take this seriously. I hope we can put this behind us now, and the capacity of these switches allow for a lot more expansion of our network without adding any more complexity to their configuration.
We had careless, and e.gormless, have some issues in the last couple of weeks and need a reboot. This is a small portion of customers, but a pain none the less. It turns out the cause is the same for both, and we actually have found the bug (this is why I have Internet access when on a cruise ship). It is an issue on the FireBrick LNS code for a really specific edge case (aren't they all) which was causing a memory leak. We have a fix, obviously, but we have managed to deploy a work around for the specific one customer line that was triggering this. This means the next LNS rolling update will include the fix.
Talk Talk packet loss
Another issue we have had, and looks behind us, is the low levels of loss on the Talk Talk back-haul. Again, I think this is all sorted, and comes down to Talk Talk involving Juniper JTAC and making some significant changes to the way their network works just before it connects to us (and a lot of other ISPs). It is not just us that can have unexpected issues like this with industry standard routers and switches.
But there are more things happening, and I thought I would touch on them. For obvious reason you have to take this all with a pinch of salt, things are not set in stone yet.
The new FireBrick...
Again, I cannot say a lot - we are launching a successor to the FB2700. The real news will come soon, when we have final application software running and can fully benchmark it. It should be a lot faster, as the FB2500 did 100Mb/s max and the FB2700 did 350Mb/s max. I am hoping for nearer 1Gb/s throughput. We are, however, pretty sure it will not do full table BGP. At this stage we are sorting EMC testing and final artwork and many other things - stuff can go wrong and delay for weeks or months.
When launched, which could be within a couple of months, it will have this extra performance, but we hope soon after to have additional software features if possible. I am hoping for much faster crypto (IPsec) to be honest, but again, until we finally get to benchmark it, we cannot tell. We just know that the underlying specs of the chipsets, even with the same s/w, should be a lot faster than the FB2700.
One the the reasons I am a tad vague is the throughput of things like this can massively depend on some of the low level features of the chipset. It is not enough to just say that CPU is faster and the RAM is faster - a lot of time is taken by cache management. Sadly the exact way the cache works in practice is not something one can fully glean from a data sheet as well as you would expect. We have been caught out in the past with an Intel based chipset for the prototypes of the current FB6000 where some simple operations that should ideally be one clock cycle literally took many hundreds of clock cycles and were needed on every interrupt, none of which was in the data sheet. We had to change the chipset for the current FB6000 series. I am optimistic for the successor to the FB2700, and expect things to come out well as the new chipsets "seem" to be really good. If they are as good as we hope, we will have a really really nice FireBrick. Worst case, we will have something better and faster than the FB2700. I also hope, cheaper, but that too is yet to be finalised.
There are, however, a couple of things I can confirm. For a change, one thing we have announced as "coming soon" before, is reality, and that is 19" rack mounting!
We have ears to allow one or two of the new FireBricks in a 19" rack mount fitting, or one in a wall mount fitting.
The other more subtle feature is a completely new power supply system. This means that, in addition to mains (110V/220V) we have DC supply options - two versions, one for automotive (12V and 24V), and one for telecoms racks (-48V). The DC options are actually a lot more complex than you would imagine as automotive has to handle some nasty spikes in some edge cases. I made the decision to have DC as an option, even if we expect relatively few customers needing them. It should also be a lot cooler!
As you will see from the picture, the final part is the SFP slot, which will allow fibre, copper, and maybe even VDSL based SFP modules to be used. Note VDSL SFP is outside SFP spec on power, so we are not sure yet, but it looks encouraging so far.
More capacity in A&A core
We have a lot of capacity now, and are not the bottleneck (which is always our aim), but we are working on yet more capacity. We have massive headroom on the Talk Talk backhaul, and we are adding more headroom to the BT back-haul. We are also updating the links we have on some peering to allow for more capacity to the likes of Netflix. A lot more 10Gb/s links are involved. This is all well ahead of usage, by some large margin. We are taking the "not the bottleneck" aim very seriously and making sure we are well ahead of the game in terms of increasing internet usage.
I know we are not the cheapest ISP, even if reasonably competitive in many cases, but making sure we are not the bottleneck so that you get the speed your line can handle is quite an undertaking. Quality matters.
This is where things really are up in the air - they depend not only on things like increased capacity (as above) but also on complex negotiations with multiple carriers, increased capacity on peering and transit, and then a lot of work on our internal systems and ordering processes.
What am I hoping? Well, no commitments yet, but I am hoping for more download allowance on the Home/SoHo non terabyte tariffs, i.e. increased allowance at same price. I am also hoping to extend the terabyte packages to allow for more lines to have this, and upgrades to these packages to be easier. I am really hoping for better minimum terms, but that really is tricky as we can so easily be stung by carriers.
One thing I am really keen on is making the tariffs simpler and easier to understand, something we always strive for. I also want to make them more available to all, not just those where we can get Talk Talk back-haul. Sadly old 20CN lines will always be the legacy and exception, sorry, but these are gradually getting upgraded.
As always, new tariffs are available to existing customers when they come out. Some will be automatic (e.g. if we can increase usage allowances) and some you can order a regrade to a new tariff when you want. If you join A&A today, then you will benefit from new tariffs I hope to have in a couple of months time.
As a slight insight, trying to get better back-haul rates out of one carrier led to our lawyer calling the contract they sent "opaque as a brick", which says a lot for how hard some of this can be. He could not even advise if we should sign it or not and he is a really good lawyer.
Please do not hassle staff!
Some of my staff will be annoyed that I have posted this all as they will be fielding questions! Seriously, they do not know more than I have posted here. I do not know more than I have posted here yet. Please, just wait and see.