2024-10-06

One Touch Switching

It has been some weeks since One Touch Switching was fully live.

TOTSCO say over 100,000 switch orders now, so it is making good progress, well, in principle.

In practice a lot is working, and in terms of volume, with the key players, as well as the likes of A&A, all working reasonably well now, switches are happening, both ways. We are seeing things working both ways and correct billing as well, which is good.

But there are still some challenges.

  • Whilst I cannot go in to details, even the big players are still facing some issues, mostly small issues but some bigger, and some with workarounds for now, and for which they are rolling out as updates. Daily calls continue with industry (yes, some I have taken from the pub).
  • A lot of smaller players are catching up, but many face the challenges of the huge holes in the specifications. These are still frozen and so grey areas, errors, and contradictions abound. Only once they are live with other ISPs are the issues apparent. In at least one case we have to ask how a CP managed to be live as they had some fundamental errors (like SOR not being a UUID) that should not have passed even the very poor testing TOTSCO claim to do. Even larger CPs are not entirely agreed on some field specifications because of poor wording in one of the better parts of the spec, but are working on it. The daily calls help, but only happen with the early/bigger CPs.
  • There is now a formal process for inter-CP communications to resolve such issues, but not everyone is on it yet. There is a fallback process as well. So things are happening, and some of these issues with smaller CPs are being fixed as well.

But even now, even this weekend, I have seen incorrect messages and errors. I have reported, of course, and it may be these end up as defects on the daily calls, we will see.

I worry what will happen when the daily calls stop - reporting an issue to a CP may mean they ignore it rather than spend resource investigating, fixing, and deploying. At present CPs have that resource assigned, but they will not forever.

What next? Well we keep at it.

The next big step I can only hope for is an unfrozen spec and a lot of clarifications and updates. It will be interesting to see how that process happens, and how we can be involved. I have a lot to say on clarifying the specifications and I hope I can be involved in making it happen. But every change will need a lot of agreement, and even some changes by all CPs in some cases. For now, there are some silly compromises like all strings max 256 characters (which resulted from a global update to a Swagger definition system, rather than any informed debate or formal specification change, and is annoying as tinytext in mariadb is 255 characters not 256). Even so, some agreement on even the vague magnitude of things like correlationID is a good start. I suspect, in practice, that one may get defined as smaller, like 64 characters. In hindsight it should have been a UUID, unique per message, but too late to do that now. The problem is the smaller/newer CPs are not in on that discussion, so don't know. Big CPs guessed at 36 characters (UUID size), 56 characters, 64 characters, and so on, as there simply was nothing in the spec, but most had to set something in their code. We changed our handling within the first few days as we understood how broken the spec was, and now handle any size (well, megabytes) but other CPs don't, and we have limits on a load of other strings anyway.

For now, I have every message we receive, and every message we send, run through my NOTSCO checker and reported to me. I feel it is only fair to test us as well. Over time it will only be problem messages that I need to monitor. It has actually highlighted some issues in what we were sending (where customers manually type an address, mainly - I have added more checks now). But monitoring real life message has also meant updates to our checking in the live system, and updates to my NOTSCO tools.

My latest changes include actually using the longer agreed size of correlationID to ensure we tag the message type as a (small) suffix on a UUID, so that we can quickly (pre-database connect/check) validate messages we get back are sensible and reject them. Why? Well one small CP is sending nonsensical replies to replies, or reusing correlationIDs from previous messages with different messages, both of which we can now pick up in milliseconds, and cleanly reject. It looks like they are working on it, but no actual communications back to us, which is a shame - we're happy to help and advise if only they would talk to us.

Overall - OTS is happening and mostly working, so do try it when you want to switch telco.

2 comments:

  1. So far we've had only had19 match requests, none of them are for addresses/customers that are anything to do with us..

    ReplyDelete
  2. The thing amusing me is that a number of ISPs' radio adverts are hinting that OTS is their idea and a USP of their service.

    ReplyDelete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

The end of 17070 and serious consequences

I just read a very concerning article on BBC  https://www.bbc.co.uk/news/articles/ckgknm8xrgpo TL;DR BT crossed wires and so a criminal inve...