Tuesday, 1 July 2014

Are TT getting as bad as BT?

We have very good monitoring, and I have spotted 43 of our TalkTalk backhaul lines that look like this. Not good. Some serious latency with identical pattern on every line so clearly a common / backhaul issue.

Reported as an "incident" but apparently as there are no alarms or other wholesalers reporting issues they refuse to accept it.

They want me to report as individual faults, 43 of them.

They then want a CLIs (we provided all 43 already).

They want to know the exact issue and I had said already: "Latency started around 16:20, increased to around 30ms at 16:30, decreased a bit and then back to 30ms around 16:55, stayed reasonably steady until 17:15 then increased up to 100ms, dipped a bit around 17:35 and 17:46 then up to a solid 100ms at 18:00 and steady since, now at 104ms as has been for well over an hour."

The asked where the latency is to, and explained to "us", i.e. LCP echos LNS to end user router.

They asked for traceroutes to bbc and google, as if I was reporting an IP level fault! I told them not to be silly. I sent all the details yet again.

When I sent all that they have the cheek to ask "Pleae[sic] put this information in a straight forward email please. The way you have formatted this is confusing and doesn't help us!"

I am at a loss, I sent it all again. I don't know how to say it in a more straight forward way!

Are TalkTalk really getting as bad as BT?! I hope not.

Update: It looks like the normal email practice of selectively quoting their email with ">" prefix on the lines and inteleaving responses to each points is somehow confusing them. I have suggested maybe they learn how to use email.

Update: Someone with clue in TT has picked up on that (seeing my blog post) and is on the case. I expect he'll be able to track the issue down. Maybe I started with the wrong team here and I apologise if I used the wrong channels.

Update: "On call engineers are being scrambled now - we have an issue in the wider Oxford area and you should see an incident coming through shortly." so the right people sorting it now..

Update: All fixed by 23:25 - I have updated the image to show the start to end of the fault. Well done to Peter at TT for getting engineers on site to sort this.

Update: I think I have worked out how to do this next time - I'll compose the reply as normal - my email client using fixed space font and colours to mark the quoted text and my replies really clearly (as it does). Then I'll screen shot it, and put the image in a word document then send a blank email with that as an attachment. I suspect that will work for them.


  1. You've been using the Internet too long Adrian - those of us who grew up using text based email and USENET know how to avoid top posting and reply properly. Youngsters have been corrupted by Outlook and haven't a ****ing clue.

    1. Not to mention that trying to reply properly in Outlook just causes a complete mess...

    2. Also not to mention some/most CRM software (Siebel for example) that try to parse the response, remove CR/LF, etc and can get very messy very quickly and not attaching email correctly.
      Word/PDF attachment the easiest solution for long mail with multiple quote.
      So it might be a layer 7 and not layer 8 issue :-)

  2. I'm amazed that people have trouble reading words that are in order on a page! Question, answer, question, answer. It's not computer science people!

    1. I know, my final reply said that their habit of top posting all replies was confusing and asked them to stop it. :-)

  3. Actually I like reading email chains in reverse order, so I like top posting. The trouble with the so called "one true way" of quoting emails by replying inline or at the bottom is you have to scroll through a long email all of which you've read before to get to the new bit, or try to find it interleaved in the middle (not always easy with a 3 week old email trail). Top posting makes the newest comment much easier to find.

    My inbox shows the most recent messages at the top. Isn't that the same as top posting?

    And I have been around long enough to count as an older timer, I've been on the Internet since 1987.

    1. Not really, the "right way" is to selectively quote the paragraphs to which you are replying and reply after each, hence keeping most emails short and easy to read on their own in context. This allows the email to be read top to bottom like a book or anything else you would read. I can see people are used to different styles but this is one that is the same as reading a book so should not be a problem for anyone - unless perhaps they have never read a book?

      My inbox orders messages in a variety of ways - the way I order them is latest message threat at the top, but within each thread the messages are ordered chronologically, oldest first, the way most people read a thread or conduct a conversation.

    2. Top posting only really works when you're having a single thread of conversation in the email (i.e. just a single question / answer per email). As soon as you have more than one thing per email then you have to start adding context such as "with respect to your second question..." before each part of the response so the recipient knows which bit of their original email you're responding to - at this point things start to become much more readable if you simply quote the bit of the original email you're responding to (i.e. traditional in-line replies rather than top-posting).

      Secondly, top posting tends to be a bit of a security risk - on several occasions, I've been brought into an existing (top-posted) thread by simply having my name added to the To: line - the person who did that clearly didn't bother to read through the lengthy list of quoted emails attached to the bottom, because if they had they would've noticed that they included some confidential data that I should not have been sent. Keeping the replies trimmed to only the relevant bits eliminates that risk.

      Furthermore, if you want to read back the old emails then this is what the "threading" display is for on your email client - relying on the quoted email chain on a top-posted email ensures that you'll miss parts of any non-linear thread. (And yes, I'm aware that some people have backward MUAs that don't support threading, but the answer to that is to upgrade your MUA rather than try to change the way everyone else works in order to suit your software's shortcomings... there isn't really a lot of excuse for not having threading - even things like PINE have supported threading since the 90s.)

  4. People who need a clue are everywhere. Today $energyco insisted that charging me 32 days standing charge in May is definitely correct. Twice! *slaps forehead*

  5. It sounds as if TT have fallen into the same "have you tried turning it off and on again?" low-skilled fault-handling trap that gives us so much "fun" dealing with BTW: they have a checklist and procedure for their low-skilled drones talking to other low-skilled drones about problems which only require low-skilled drones to deal with, like "my line's gone dead, send someone to join the bits back together/plug it back in please". As soon as it's something complex, you're off-script and really need someone clueful: Shibboleet time!

    Incidentally, have you got a good way of distinguishing between problems on the TTW and BT portions of the circuit? 'Wider Oxford area' is presumably a big enough area to be across multiple CableLinks, meaning the common components are all TTW-side in this particular case - but of course being able to determine that yourselves would be nice...

    1. To be fair to TT, they do have a shibboleet email address for us, but we don't want to abuse that and the main "incident desk" should really be able to cope with an "incident".

      I told them the 43 lines affected and that it was congestion and that we were seeing latency of 100ms and what time it started. That should really have been enough.

      When I was then asked to confirm which lines (again) and what the problem was (again) I did start to get annoyed.

      When I was asked to do a traceroute I realised they had no clue - we don't buy an IP service from then, we buy a PPP backhaul service over L2TP, so my latency report was, of course, related to that, but it seems I did not make that clear.

      When I was then told to send the same details yet again as my email was not clear, that is when I decided it was time to blog...

      The good news is that there are people with clue in TT, and they got engineers on site to fix the problem by 23:25. Good result.

  6. I'm starting to wonder if you should sell your monitoring software as a service to other ISPs. It seems like whatever wholesale monitoring software there is out there at the moment just isn't up to scratch.

    1. It is available, though it's not pure software: it's part of the Firebrick FB6000 range. ThinkBroadband have an FB6102 anyone can use for ICMP monitoring, given a static IP address.

      The monitoring A&A use is a lower-level version, though, using LCP echo (like an ICMP ping, but part of the PPP protocol itself). You can only do that if you are using Firebrick routers to handle all the PPP (or rather, L2TP, which is how BT deliver it to the ISP) traffic - which, of course, A&A do, but would be a major change for another ISP to adopt.

      What I found interesting on these graphs is that whatever the problem was, it seemed to develop gradually over the course of more than an hour, then stop abruptly once fixed.