Monday, 26 December 2011

The telco that stole Christmas

Our favourite telco have very much stolen Christmas this year, not just for me, but for an elderly couple in Malvern with no Internet or VoIP phone over Christmas.

The executive summary is as follows... Our favourite telco messed up and broke the configuration on a line, and it stopped working on Friday night before Christmas. The MD of ISP (me) spent all day Christmas Eve, Christmas Day, and Boxing Day, chasing the telco to get it fixed - dispelling and disputing every excuse and escalating all the way to Director level within the telco. This meant pretty much every half an hour checking the status, updating the customer, chasing the telco on email, echat or call, and in-between reading terms and conditions to quote at them. I did have time to open a few presents with my family in the gaps.

But lines break, and take time to fix, surely. This is not a special case is it?

Well yes, but... For a start the fault was caused by them, not some tree falling on a line, but the config on their kit. Then they take ages to make any progress, but do find the cause (well done Archana). This is to no avail as they then say that people who can fix it only work normal working days. But hang on - they agreed to fix faults within 40 clock hours - this is not even trying to fix within that time, and offering something you won't even try to achieve is unethical, dishonest, or even fraudulent... Naughty telco.

Late Christmas Eve I get someone to change their mind and pass to a 24/7 team that should be able to help. Told I have to wait until morning. Hmmm

At 5am Christmas morning I find they have chosen to order a tie pair modification (TPM). This is wrong! The customer equipment, modem, line, MSAN and even BRAS are all working - just that the session is being killed by them after a second with message "Subscriber provisioning failed". It is not a faulty port. What really winds me up is that I took the time to make very very clear to the person on the phone on Christmas Eve that two of their tests were misleading. The RADIUS test incorrectly assumes all short sessions are a "reject" by the ISP and in fact we are accepting the session, and also their TAM system that tries a login will say PPP failed which is to be expected as the link is closed within a second and does not mean the port is faulty. Even so, at 5am they decide (as TAM test PPP is failing) that it is a port fault and needs a TPM - arrrrrrrrrrrg!

To add to the fun, the first thing a TPM does is shut down the existing port (it is a process to move the service to a new port). This is mad when a line is partly working as it takes it out of service for up to 3 working days. In this case it means that at 6am the line stops trying to connect every 4 seconds and is no longer working. Our favourite telco tries to justify the TPM on the basis that the port is not working - but the TPM was 5am and the port stopped working at 6am. Time travel has been discovered it seems.

What is then strange, after a lot of shouting and even involving some direct telco staff member contacts at home at 07:22 on Christmas day morning (you are a star, Ian) and a duty manager who was working (well done John), we get the Director's Service Office involved. They work out the quickest thing now is to try and get a tie pair mod finished quickly. They are pulling strings, trying to get engineers on call and all sorts. Seems it is hard to get an engineer out on what is by then Boxing Day.

I am glad they are trying at last, but wait, this makes no sense, surely the have engineers anyway. After all they sell an "enhanced" level of support which means they work 7 days a week (even Christmas). So where are those engineers. And hang on... They have a service that means you can (for a fee) have an engineer working on this, aiming for a 7 hour fix. This means getting engineers to fix things at short notice is a standard service they offer (for a fee), so why is it a problem when they look to be in breach of contract for them to invoke those normal processes? No answer yet on that one!

Of course, one small gem in this is that at one point they said they cannot do anything Sunday as they do not have people working Bank holidays. They do have people working Bank holidays for the enhanced care. Just to add to the fun we pointed out that 25th Dec is not a Bank Holiday this year, 26th and 27th are. No reply on that one for some reason. It raises the point though - their definition of working hours (where it matters) is Monday to Saturday excluding bank holidays - so if the 25th Dec is Friday or Saturday that means the Saturday (not a bank holiday) is normal working hours. A fun one in future years I think.

The big thing here, and one of the main reasons I personally as MD of an ISP have been so tenacious about this, is that they are not even trying. To be quite honest I think I would have been just as tenacious whether it was my parents' line or not! Over and over again they will ignore the promise they made to fix a fault in 40 hours. They spend a huge amount of effort making up excuses rather than actually fixing the fault. It is especially wrong when the fault is theirs in the first place and is a "soft" fault (i.e. fixed by someone at a computer terminal).

The upshot is a lot of reading of terms and conditions which makes matters worse for them. They say they will fix within 40 hours. They say that if a visit to site is needed they will respond within working hours. The two statements are not incompatible in any way. We are quite happy for a fix in 40 hours but no "response" until working hours. The delay in "response" does not remove or change the obligation to fix in 40 hours, and the only way to achieve that is to have people working bank holidays. If they don't then they are not doing or even trying to do what they agreed to do (fix in 40 hours) and that is serious - not just breach of contract but if the initial offer was made knowing they would not try, then you have possible fraud issues. We will pin down what working hours are even when 25th Dec is a weekend and we will pin down what their contractual obligations are even if I have to pay a legal advisor. Plan is to put the exact outcome on a detailed web page so our customers know where they stand.

Whilst I am happy to say that I will always fight this cause, I am dismayed that it is a fight at all - it should not be - they should be prepared to do what they agreed (they make these terms and conditions) and work with us to make that as efficient as possible. We want that. We want lines fixed, not a huge machine in place to make excuses. They even have a process for being rude, called "hard turn back" where they will point blank refuse to help you - surely someone must have realised they were turning to the dark side when they invented that process!

Maybe worth saying what should have happened in this case, in my opinion... Their systems agreed there was a fault at the start. So someone should have looked at it, and perhaps even felt out of their depth. It is an odd one. They should have passed to someone that could understand it, and they should have passed to a team that can fix it. This could well have taken several hours, but the next morning we should have had a clear diagnosis and fix in place and been back on line. There should have been updates at each stage, even if hours apart. That is the ideal world. No need for me to bounce it back at every stage. No need to pass to departments that don't do 40 hour fixes. No need to try and make a Sunday into a Bank Holiday or invent any more excuses, and certainly no need to involve the Director's office. If they had done that then a handful of people would be involved for a matter of minutes each. Instead I dread to think how many people have spent how many hours on this. I alone have spend most of three days on this, days I should have been with my family, or at least in Azeroth...

As it stands they are really not working together on this, and are making it a battle. When they have an ISP like us that stands up for what is right and want to actually fix faults they will always have a battle unless they change that attitude. We can be a valuable asset or a major foe, and it is their choice. But maybe their heart is two sizes too small... Bah Humbug...

Happy ending? To be added (see below). As of time of posting the line is still not working - in fact after eventually getting a TPM and chasing why it did not work for some hours we have got back to the state the line was in at the start, on Friday night - connecting and being closed by their end with "Subscriber provisioning failed". I wonder what next.

Update: 18:32:47 boxing day - on-line. Finally. Simply needed a "rebuild" of radius and mux config - something they could have done Friday night.


  1. The sad fact Adrian is that nothing will change.

    BT are too fat. No one will get in trouble, no one will get fired, nothing will change at all.

    Maybe, just maybe, they will get another $lawyer to write another couple of pages of Ts & Cs that only one person ever reads anyway. (you)

    There will be no attempt to try and improve services to the customers. BT have what amounts to a monopoly and don't give a fig.

    Please don't stop trying to make it better Rev but I sometimes think that you would be better spending your energies inventing cold fusion where there is at least a chance of success?

    Bah! I sound like a humbug. I'm going to go play with IPV6 for a giggle.

  2. Great to hear somebody is holding BT to their T&Cs in the same way they seem to do with their customers. I know you often get accused of just being pedantic or difficult, but I really do think you've raised some very valid and important points.

    It's interesting to see instances where you're able to prove, using irrefutable evidence, that they are doing what I often expect, but can't prove - making up excuses. Well, maybe that's being too nice about it. It's lying, plain and simple.

  3. Instead of ruining your Christmas and that of a director at BT, you could have let things be and waited until after Christmas, and let the Christmas skeleton staff at BT get on with real emergencies such as telephone lines down at hospitals etc. You know, no Internet at a private residence is not the end of the world, life does go on.

    You are also wrong about the SLA and working hours. Whilst Christmas Day may not technically be a bank holiday if it falls on a Saturday, it is recognised as a Public Holiday under Common Law due to common observance (which is also why the first statutory regulation of bank holidays, the Bank Holiday Act 1871 did not declare Christmas Day a bank holiday). BT's SLA may not be 100% exactly worded if they state "Monday to Saturday except bank holidays", but sadly for nerds like you, it is an established principle of English law that words in contracts be interpreted not literally but according to their ordinary grammatical meaning and in the context of the contract. It is clear to any person with common sense, except you, that "Monday to Saturday except Bank Holidays" does not include Christmas Day or Boxing Day, should they fall on a Saturday.

  4. go proactive the backup team do and it proves worthwhile and always available to help out :)
    there is and always is callout to fix issues


  5. What the hell? let them get out of what they actually agreed to do?

    Why would I do that? It is not like Christmas crept up on them without them knowing about it.

    The fact that it is not a bank holiday is not actually relevant just an amusement as one of the many excuses they tried to invent. They contract to fix the fault in 40 clock hours. There is a caveat for working hours but it is just for where a visit to site is necessary and only affects the timing of a "response" it does not remove their agreement to fix within 40 hours and in this case the fault did not require a visit to site anyway. I included that as an example of them making up excuses rather than fix the fault and an amusement that the excuse they made up (claiming that being a bank holiday was the problem) was flawed as 25th Dec this year was categorically not a bank holiday. Any reasonable person looking at the DTI web site to see what bank holidays we have would know that.

    Yes, I could have ignored it and left it to next year, but I didn't. I cared. They should do what they agreed and if that means employing staff to do it then they need to do that. They should not have a complete machine for fighting their customers and inventing excuses. They should simply have fixed this (a simple matter of reconfiguring the mux and msan) when it was reported and everyone would be happy and nobody would have been hassled. They made this worse by inventing excuses, avoiding fixing the problem and insisting on a pointless engineer visit. All I did was expect them to do what they agreed which is not unreasonable is it?

  6. Thanks Colin, I am quite impressed that I was contacted by the back up team to help on this.

    As for call out, it seemed like this was a lot of hassle. That is odd as I could have reported this as a paid expedited fault with a 7 hour target fix - so engineers have to be on call. The fact normal enhanced care is 8am to 6pm working 7 days a week including Christmas means there must be engineers (albeit fewer) working these days. So I do not know why it is such hassle for them to sort the mess. Glad they did though.

  7. Just in case it is not that clear, I am very happy with the senior account team and duty manager who tried to help and eventually managed to get this sorted. I am unhappy with the whole "machine" that is in place to fight us rather than fix faults, and the incompetence of many of the front line staff. Yes, a few were good but then did not have the power to fix the fault.

  8. Well if your SLA really states they will fix faults not requiring a site visit within 40 clock hours, and the fault did not actually require site visit, then they have exceeded their SLA (from Friday night to Boxing Day afternoon it is at least 60 hours) and you should claim the service credits you may be due. However, have you checked carefully if it really is "clock hours" with no exception (for public holidays etc.) and the SLA is to fix an individual fault as opposed to a mean time to repair (MTTR) averaged over the faults occurring during a period?

    As regards your complaints about BT's non-agile response. I perfectly understand your description of the fault in the recount of your story, and if your account account is correct, the fault could have been fixed by remote hands, whereas BT did not realise that, at least not initially, and they did not take your input to that effect into account. You have to bear in mind, however, that an organisation like BT cannot always act in the same agile way as you do. They deal with thousands upon thousands of incidents every day, so they need to follow strict processes to have any hope of managing such a workload. The process may seem cumbersome to the individual user affected by the fault, and yes, the processes may lead to unnecessary steps being taken, as in your case, but this is unavoidable in the greater scheme of things. It is simply not possible at a large telco to deal with every single incident on an individual basis.

    As for the bank holiday issue, you keep reiterating that Christmas Day was not a bank holiday. It was a Sunday, however, and therefore not included in the "Monday to Saturday" SLA in the first place. (I know that this only relates to site visits, but at that stage BT believed a site visit was necessary).

  9. The SLA does say they will fix in 40 hours if no site visit is needed, and their staff have agreed no site visit was needed, and indeed the site visit they insisted on doing left the line in the same state proving no site visit was needed.

    It really is "clock hours" and they really do use that phrase, and it is for each individual faults. In fact they don't say they won't fix fault needing a site visit in 40 clock hours they just say they will only "respond" within office hours if a site visit is needed. I don't care when they "respond" as long as they do the "fix" in 40 clock hours as agreed, but this did not need a site visit.

    As for claiming compensation as per SLA - of course we will do that. That is the agreed compensation for a breach (peanuts). My point is that you cannot simply promise something like a 40 hour fix and not actually bother to try and achieve that - doing so would at least be unethical if not fraudulent. Yes, there will be exceptional circumstances, but they should at least try to achieve the agreed target not fob people off.

    They did realise it could be fixed by remote hands. One of the first echats on Saturday morning had a very competent person who worked out the exact nature of the fault. The issue then was that the team he thought could fix it were not working until 28th. That is "not trying to fix the fault within 40 clock hours". In escalating that they made the bad choice to order a TPM without asking me and against my previous advice that their tests were misleading and exactly why. They ignored me, and did it anyway and have no process to correct (undo) a TPM order.

    What you say about them having problems of scale is not unreasonable, but those processes should not involve inventing invalid excuses, and fighting the ISP. As soon as they realise they have a problem with the process they need to fix it and have a system to do so. The escalation system should accommodate that but was not adequate in this case. It needed a lot of involvement of the account team to make any progress and finally needed the DSO. This is all as per the escalation process agreed (i.e. one of the valid agreed reasons to escalate is that they are not meeting the 40 hour target or not likely to, so valid to keep escalating all the way to DSO if that is the case). OK contacting someone at home is not part of that but he is a mate I knew could advise me.

    The 25th is not part of their normal working hours definition because it was a Sunday. My problem was that they kept insisting the issue was that it was a bank holiday not that it was a Sunday. That was a made up and incorrect excuse which deserved to be shot down, sorry.

    They do need to care more, and they need to take notice of someone that has a clue what they are talking about. Between my knowledge and monitoring and the competent person on the echat we had this well diagnosed Saturday morning. If only they had acted on that diagnosis rather than trying to fob me off and misdirect me.

    Now, apart from working out exactly what they mean by the terms in their SLA and making sure we and they clearly and unambiguously understand them, we also have been discussing ways they can improve matters. We are trying to work with them to make things better for everyone, not just A&A customers. A news item like this helps put pressure on them to work towards that goal and perhaps work with us not against us.

  10. I'm so glad I no longer have to deal with BT on a professional basis (at least it was professional from one side of the relationship).

    I go through something similar with BT (via my ISP who are in your mould, but not your tenacity) pretty much every time they do "transparent upgrades" in the regional aggregator node - net result is that ESR1.Edinburgh5 rejects my PPP session before even attempting to pass it to my ISP.

    Thankfully they're woeful on scheduling such changes so I only get knocked out about 3-4 times per year but you'd think that it's not beyond the wit of man to see there's a pattern to the failure...

    BT have insisted on sending an Openreach boy out with a laptop and a USB modem (no sign of a ladder and/or any tools) to confirm what I already knew - there's sync on the line, the exchange is talking to the Regional ESR... ie. not a premises fault - Imagine my surprise a week later when Openreach turn up with a van with a cherry picker to commence a full physical reprovision of the underground line with an overground one from a telegraph pole out the back - this fully 5 days after the (software) fault was resolved.

    Professionally I had a site with a BT leased line that went offline reliably as soon as there was more than 5mm rainfall - they always sent a man to fix it but took so long that the sun was back out and the fault had resolved itself. Only when I took a look at the exchange did the root cause become obvious - the exchange at the bottom of a ramp, all the rain flowed down the hill - as it does - and straight into the "service" duct of the exchange. Crappy joint in mucky water = loss of service. That one was finally fixed when they put a speed bump on the exchange entrance driveway that diverted enough water away from the exchange... as far as I know the crappy joint is still there. Same site they managed to cease (in software only) the wrong ISDN30 bearer because *their* labelling at the exchange end was wrong (I can't believe that they have to send someone to site to check a physical label - must be labelled in the software surely?) and told us standard terms (6 weeks) to reprovision - told them Cable & Wireless were due onsite to provide 2 circuits that day, I could easily change the order to 3...

    The left hand of BT doesn't even know that there *is* a right hand.

  11. So how much money has it cost them to fix this? They presumably had to send engineers to an exchange for the TPM, they had staff on phones and emails discussing it, they had technicians looking at it, they had directors responding to directors and trying to sort things out.

    And all it took was probobly no more than half an hour at a terminal to fix a fault that you had already correctly diagnosed to them.

    If they actually listened to their customers (you) and implemented some feedback then all this and likely countless other completely pointless TPMs, exchange visits, site visits etc would have been prevented resulting in cost savings, shareholder happiness and satisfied customers.

    Of course the problem is than when somebody with sense joins them and tries to sort things our they are met with brick walls of idiocy and jobsworths who will block every attempt at making things work. I know a few people, good people, who have joined the telco and left just because it is impossible to change things and they have given up.

    See, they should never have privatised them. ;-)

  12. Indeed. They have some good people. I was actually surprised at the echat on Saturday morening that worked out the issues. That is surprising, which in itself is a shame. They have good people that want to make changes (my previous account manager is one). I think they make it very difficult for people with that sort of insight and drive to get anywhere.

    The account team were pretty good at sorting a mess that should not happen reasonably quickly.

    I hope this gets investigated internally. Sadly I suspect it will, and the outcome will be more ways to stop people like me doing embarrassing things like this rather than anyone trying to fix the underlying problems.

    FYI we have put forward various ideas for the future of broadband provision that even fits with their corporate schizophrenia model but reduces a lot of points of failure and uncertainty over responsibility.

  13. Malvern seems to a black hole as far as BT/Openreach is concerned. FTTC has been scheduled for 31 Dec 11 - the cabinets are in and we have been expecting switch on , well like now. The BT checker today (6/1) changed the availability for FTTC to 31 March 2012!!! aargh. When will they ever learn how to work with their customers. Your comments about them are spot on.