Thursday, 9 November 2017

Better broadband

As our customers will know, at AAISP, we take the quality of the broadband service we provide very seriously, and one of the things we do is an LCP echo every second on every line.

Whilst that may not mean much to some people, what it means is we are constantly testing the link to each and every customer every second to see if there are issues.

Even though we are comparatively small compared to the big players like BT retail, we spot issues in back-haul networks before almost anyone else because of this monitoring. There are other ISPs using FireBrick, so we are not alone, but possibly one of the biggest ISPs using FireBricks, and collating the data over thousands of lines.

Over the years we have seen some really interesting issues, mostly with BT, but even with Talk Talk back-haul we have seen some issues. It is really good that both carriers are prepared to work with us to get issues resolved - although it can be an uphill struggle to get issues recognised initially.

The long grass.

One of the very first issues we found, many years ago, is something called "long grass". It is actually what made us start testing all lines all the time. It got that nick name because the latency response on our graphs is green at the bottom. The long grass was spikes of green at frequent but irregular intervals. These spikes of as much as 50ms occasionally were enough to interfere with VoIP calls.

It took months to get to the bottom of it, and ended up being some Juniper routers which seemed to stall for 50ms when they updated routing, e.g. when someone connected or disconnected a line (which is why the spikes are a bit random). They provided BT with a patch and it was finally fixed. As I say, this was years ago.

Only our monitoring, being able to put together a complete and exact list of the hundreds of lines impacted by this allowed it to be pinned down to specific makes of router. Also, the fact the same pattern of grass appeared on lines connected to the same specific router was also a clue. Almost any other ISP monitoring would not have picked up the issue at all, let alone made that connection.

As an aside, this level of issue is so specific, it is hard to see the likes of OFCOM ever understanding that this could count as a "fault" in any way. Even BT struggle to define this sort of thing as a "fault" and had it been just one line we would not have got them to fix it, which is a shame.

Dripping blood.

Another issue is congestion which results in packet loss, and that slows down lines quite a lot. Called "dripping blood" because we show loss as a percentage from the top of the graph in red. Even 1% of random packet loss can have a big impact on TCP file transfers. This is something that can happen for lots of reasons, but when we manage to correlate to a specific BRAS, metronode, exchange, or even cabinet, we can help ensure the issue is resolved.

On one occasion BT found they had some serious issues within their core network as a result of our monitoring and reporting to them. Again, like the long grass this impacted every ISP using BT back-haul. They had faulty / dirty fibre links, and had some serious misconfigurations on some ports and aggregate links. We had sight of the report to BT directors but sadly could not have a copy - and our monitoring graphs which we provided to BT were key throughout the report.

We had a similar issue on Talk Talk early on, a congested exchange. Turned out to be misconfigured port at 100M not 1G (or was it 1G not 10G, I forget). They fixed within the hour if I recall, thanked us, and ran a script that found half a dozen other similar errors where congestion was not yet happening.

Working with Talk Talk.

We have addressed many issues, especially in BT, simply because of the number of years we have been using them. Talk Talk have had fewer issues, and the main one has been some congestion for some customers. This has cropped up several times and been addressed with various workarounds. However Talk Talk are taking this as seriously as we want them to and they now have a new back-haul network using new Juniper LTSs. This should sort the various capacity issues we have seen and ensure the service is the quality it should be.

The fact we have had issues over the last couple of months is telling in the Think Broadband speed test survey where we were not the highest quality rating of 0.1, but only 0.2, last month. This is a big concern, and probably down to these issues. We were still the fastest FTTC provider they tested though. Whilst November may not be better we hope we are back to best quality metric in December, and of course retaining our top spot for fastest FTTC. Having this sort of independent and impartial testing is very important for ISPs like us, and far more than just blowing our own trumpet.

Today we are testing with some customers on the new Talk Talk platform, and expect to switch over once they are properly ready in a couple of weeks. We are one of the first on the platform, so this testing is important. Obviously those using the new platform may find the lines get kicked off or reset whilst TT are working on this, but so far the testing is going well, and they can always move back at any point.

Not just us...

So, overall, we are keen to work with carriers to ensure their network is the best it can be and so ensure our service is the best it can be. If that happens to make things better for other ISPs and their customers, so be it. As long as our customers have the best, we are happy.

7 comments:

  1. I really do despise the number of times I've come across errors with almost all the suppliers due to them manually setting speed/duplex.

    I wonder how many ports there are in their networks that are misconfigured because they don't leave it at auto.

    BT can be very hit and miss a times. Port misconfigurations tend to be fixed very quickly, yet they can drag their heels at times. If the line isn't currently on fire, BT seem to reject the fault and you have to yell at them to progress. It doesn't help that BT hide 2nd line away and it's almost impossible to get in touch with them when you want to talk to someone who knows what they're doing.

    Virgin Media have pretty good staff that you can have technical conversations with. Colt are awful, Talk Talk have been fair in my experience. Level 3 are pretty decent, and Vodafone are a pain in the arse. I don't mind SSE either.

    There are others I'm forgetting no doubt. If I have to deal with DSL faults, I'll take BT over Talk Talk any day of the week.

    ReplyDelete
    Replies
    1. Colt had to raise a ticket before they could find out which of their NTP servers we should use via our Gigabit fibre circuits - the query took 24 hours to resolve!?! One word of advice, always avoid BT unless you have no other viable alternative.

      Delete
    2. I would probably rate Colt as the worst supplier I've dealt with, made even worse when they outsourced their technical support to India.

      Virgin Media are consistently the best. Their staff are helpful, they very rarely drag their heels when you need an engineer, and it's very easy to find out what's going on.

      I would say BT are middle of the pack. A pain if it's something more complicated, but for your bread and butter faults they'll usually get the job done without too much fuss.

      Delete
  2. I have a TTB FTTC line and have been seeing drops/splodges of blood at peak evening times, most days of the week. I'm sure it's got nothing to do with your network, but it's nevertheless somewhat disappointing.

    ReplyDelete
    Replies
    1. See: https://aastatus.net/2454

      Delete
  3. Do you find that BT/TT take reports from AA more seriously now, having had you find issues that were otherwise missed previously?

    ReplyDelete
  4. I've been seeing the long grass issue for the past three weeks or so. Strangely it is long grass for about an hour then short grass for about an hour. My line had permanently short grass before this change. Any suggestions?

    ReplyDelete