As our customers will know, at AAISP, we take the quality of the broadband service we provide very seriously, and one of the things we do is an LCP echo every second on every line.
Whilst that may not mean much to some people, what it means is we are constantly testing the link to each and every customer every second to see if there are issues.
Even though we are comparatively small compared to the big players like BT retail, we spot issues in back-haul networks before almost anyone else because of this monitoring. There are other ISPs using FireBrick, so we are not alone, but possibly one of the biggest ISPs using FireBricks, and collating the data over thousands of lines.
Over the years we have seen some really interesting issues, mostly with BT, but even with Talk Talk back-haul we have seen some issues. It is really good that both carriers are prepared to work with us to get issues resolved - although it can be an uphill struggle to get issues recognised initially.
The long grass.
One of the very first issues we found, many years ago, is something called "long grass". It is actually what made us start testing all lines all the time. It got that nick name because the latency response on our graphs is green at the bottom. The long grass was spikes of green at frequent but irregular intervals. These spikes of as much as 50ms occasionally were enough to interfere with VoIP calls.
It took months to get to the bottom of it, and ended up being some Juniper routers which seemed to stall for 50ms when they updated routing, e.g. when someone connected or disconnected a line (which is why the spikes are a bit random). They provided BT with a patch and it was finally fixed. As I say, this was years ago.
Only our monitoring, being able to put together a complete and exact list of the hundreds of lines impacted by this allowed it to be pinned down to specific makes of router. Also, the fact the same pattern of grass appeared on lines connected to the same specific router was also a clue. Almost any other ISP monitoring would not have picked up the issue at all, let alone made that connection.
As an aside, this level of issue is so specific, it is hard to see the likes of OFCOM ever understanding that this could count as a "fault" in any way. Even BT struggle to define this sort of thing as a "fault" and had it been just one line we would not have got them to fix it, which is a shame.
Another issue is congestion which results in packet loss, and that slows down lines quite a lot. Called "dripping blood" because we show loss as a percentage from the top of the graph in red. Even 1% of random packet loss can have a big impact on TCP file transfers. This is something that can happen for lots of reasons, but when we manage to correlate to a specific BRAS, metronode, exchange, or even cabinet, we can help ensure the issue is resolved.
On one occasion BT found they had some serious issues within their core network as a result of our monitoring and reporting to them. Again, like the long grass this impacted every ISP using BT back-haul. They had faulty / dirty fibre links, and had some serious misconfigurations on some ports and aggregate links. We had sight of the report to BT directors but sadly could not have a copy - and our monitoring graphs which we provided to BT were key throughout the report.
We had a similar issue on Talk Talk early on, a congested exchange. Turned out to be misconfigured port at 100M not 1G (or was it 1G not 10G, I forget). They fixed within the hour if I recall, thanked us, and ran a script that found half a dozen other similar errors where congestion was not yet happening.
Working with Talk Talk.
We have addressed many issues, especially in BT, simply because of the number of years we have been using them. Talk Talk have had fewer issues, and the main one has been some congestion for some customers. This has cropped up several times and been addressed with various workarounds. However Talk Talk are taking this as seriously as we want them to and they now have a new back-haul network using new Juniper LTSs. This should sort the various capacity issues we have seen and ensure the service is the quality it should be.
The fact we have had issues over the last couple of months is telling in the Think Broadband speed test survey where we were not the highest quality rating of 0.1, but only 0.2, last month. This is a big concern, and probably down to these issues. We were still the fastest FTTC provider they tested though. Whilst November may not be better we hope we are back to best quality metric in December, and of course retaining our top spot for fastest FTTC. Having this sort of independent and impartial testing is very important for ISPs like us, and far more than just blowing our own trumpet.
Today we are testing with some customers on the new Talk Talk platform, and expect to switch over once they are properly ready in a couple of weeks. We are one of the first on the platform, so this testing is important. Obviously those using the new platform may find the lines get kicked off or reset whilst TT are working on this, but so far the testing is going well, and they can always move back at any point.
Not just us...
So, overall, we are keen to work with carriers to ensure their network is the best it can be and so ensure our service is the best it can be. If that happens to make things better for other ISPs and their customers, so be it. As long as our customers have the best, we are happy.