2024-08-01

TOTSCO load

OK, UK has 66 million people, assumed EVERY SINGLE ONE, man, women, child, has a separate broadband link, wow!

Assume every single one wants to switch broadband every single year.

So 66 million broadband switches a year.

  1. That is 180000 a day
  2. 7500 an hour
  3. 2 a second.

So TOTSCO would handle, on that massively exploded logic, 2 a second.

FFS a single Raspberry Pi could do that.

Even "office hours only" pushes to like 6 or 7 a second maybe.

Why is this a whole company, and AWS servers and whatever?

Update: To be clear this is very flippant and verging on a joke. Yes, the load will be very little, and yes, the peak may be more, but not hugely, and yes, you need redundant servers and so on. But the whole thing feels like massive overkill the way it has been done, that is all.

2024-07-30

12:60

I understand a clock face showing 12 at the top and numbers around 1 to 11, and then back to 12.

As a programmer, the issue of starting counting from zero or one is a massive issue.

But for time of day, we understand the clock goes 11:59, 12:00, 12:01, and that 12:00 is both midnight (and ongoing minute) and midday (and ongoing minute). I won't even start on the concept of "pre" and "post" meridiem and what 12:00 exactly means.

But what has always puzzled me is the use of minute marks that include a "60" at the top. I have always considered minutes as 00 to 59. But clearly some old clock faces considered that minutes, like hours, got 1 to 60, or 60, 1, to 59.

This leads me to wonder. Does the time, for those that created and used such clock faces, go (at around midday) :-

  • 11:59
  • 11:60 (midday)
  • 12:01

Or does it go

  • 11:59
  • 12:60 (midday)
  • 12:01

And then I wonder on seconds, using the same logic?

  • 11:59:59
  • 11:59:60 (midday)
  • 11:60:01
  • ...
  • 11:60:59
  • 11:60:60 (one minute after midday)
  • 12:01:01
  • 12:01:02

Or what?

Does anyone know what the actual logic was historically for the use of "60" as a minute, and why "60" was ever on a clock face?

And I know, leap seconds 23:59:60 exist, but that is a different matter!

2024-07-24

TOTSCO 66 is guidance, optional

I feel I need to explain this.

The TOTSCO call today, first I have been on, and wow!

But a key point was TOTSCO bulletin 66, which is actually quite sensible guidance.

So what is the problem? It is guidance, not mandatory. CPs don't have to follow it even.

So let me try to explain.

If ANY CP follows that guidance then ALL CPs have to change how they create a source correlationID to be totally unique.

The API specification does not require that, so it is a real change.

If some other CP does not do that, the recipient CP, following the guidance, may assume a duplicate message and discard it.

This is non trivial.

2024-07-23

OFCOM disinterest in OTS?

OFCOM sent a mildly threatening letter about One Touch Switching and the impending deadline.

I replied by email, two, nothing. So wrote, and nothing.

So now this.

We'll see if they reply.

Update: OFCOM want a call, yay!

Update: Useful call, OFCOM listening.

OK, being constructive

I am pondering what could be done right now. So some thoughts...

  • Firstly - this is not simply pedantry, or my getting pissed that I misread the spec - this is not hypothetical. Working with other CPs, and monitoring testing on my NOTSCO test platform, yesterday, half the CPs testing were falling foul of the latest checks on source correlationID added because of TOTSCO bulletin 66. I see other CPs are running in to all the issues I have raised with the specifications. Most of the errors my test platform picks up would not be picked up by the existing TOTSCO testing process. 
  • I feel some people with some clue how to write a clear specification and understand the challenges of coding systems to meet such a specification are engaged with TOTSCO and taken seriously. I can help (though probably not for free - though have I made many suggestions anyway).
  • I feel the specifications need to be consolidated and simplified and put in one place - there are too many parts, in different places, some freely available, some under a login on the control pages, some XLS, some a web page, some PDF, and so on, it is a total mess. Clear and complete set of specifications in one place.
  • TOTSCO need the specifications updated, and kept updated, and a process to notify updates to all CPs involved, so they can ensure compliance. This means proper change controlled notices of what has changed, not a random bulletin that assumes/implies a serious change to a spec that is in a change freeze! Even if this was a weekly spec update with all changes sent to all CPs.
  • I feel going for 12th Sep is fine - we have to start somewhere, but for the start of full OTS usage, and not a requirement for all CPs to be on line, simply because I am not sure there is time for that. But from that date, all CPs that are live on TOTSCO should offer it as part of their ordering process, related to other CPs that are on TOTSCO.
  • Some later deadline for all CPs, maybe an even later one for small CPs.
  • I definitely think a self service test platform is needed for API and OTS with all sorts of scenarios (valid and error testing) and messages both ways, needs to be in place, and a key part of compliance testing. I have one, and I am happy to work with TOTSCO if they want to use it. But it literally took only a couple days to make, so TOTSCO could make one themselves. Testing should be to a reference implementation and against the specification. In practice making this a CP on pre-production (and even live), called TEST, with a control page on TOTSCO to manage tests and replies and logs, would be ideal.
  • We also seem to lack a way to contact other CPs when live to address issues - and a way for TOTSCO to arbitrate that one CP claims another CP does not meet the spec. A clear spec is needed, but a whole inter CP dispute process needs to be in place - and a reference test system would be invaluable for that.

2024-07-22

TOTSCO moving goal posts, again!

One of the big issues I had in initial coding was the use of correlationID on messages. The test cases showed it being used the same on a sequence of messages, e.g. a Switch Order had a destination correlation which only made sense if it was a response to a Match Confirmation, for example. I was wrong, but not for lack of reading the spec.

The API spec says this: In a source element, the correlationID must always be provided, the format can be anything the originator chooses to support their messaging process but should be sufficiently unique to allow correlation of response with request over a reasonable period.

This makes it clear what purpose the correlation ID has, it matters to sender so they can correlate response with request. It also makes it clear the sender is who chooses the correlationID.

Now, for that purpose a Match Request, and subsequent Switch Order, and Switch Order Trigger could all have the same correlationID. Indeed, arguably, a sender could use the same correlation on all Switch Order related messages because the messages all carry a Switch Order Reference, which can be used to tie the response to a specific order. An obvious choice, and we nearly did this, was to use the actual switch order reference as the correlationID.

Also, there is nothing to stop an originator, when generating a reply, to use correlationIDs differently, as they don't expect a response to that reply, and there is no correlation of response with request. Again, an obvious choice for the various switch order messages would be the switch order reference, as this is the one thing missing from a MessageDeliveryFailure message, and would allow that error to tie to a switch order.

TOTSCO Bulletin 66

TOTSCO just released bulletin 66, on handling received (from hub) messages better, notably on response times and validation, but also on handling duplicate requests. They detail a recommendation that the messages are cached for a while, per originating RCPID and source correlationID, and use this to spot a duplicate.

If a sender chose to use the same correlationID for a Match Request and Switch Order, which is definitely sufficiently unique to allow correlation of response with request as per the spec, the recipient would see the Switch Order as a duplicate message and ignore it, maybe resending the Match Confirmation.

If the sender chose to use the SOR on switch order messages or replies, the recipient would see all messages after the first as duplicates, and ignore them.

So now, if effect, based on just a bulletin, the specification mandates that every message sent (request or reply) has a unique correlationID, something not in the spec. In general this is a good idea, but the API spec should have stated that at the start! It now means the source correlation ID matters to the recipient as well, not just the sender. And they have not changed the spec as it is in a change freeze. Oh, and there is no size limit for a correlationID.

The bulletin does not even actually say the sender correlationID has to be unique, it basically assumes it is and explains how recipients can assume it is for spotting duplicate messages!

Once again, a fiasco.

P.S. Our implementation does unique source correlationID already (uses a UUID).

Also, I have updated the NOTSCO test platform to warn of duplicates, and generate a duplicate as well to test CPs handling of duplicates.

Just to add, the confusion caused by the poor specifications is real. Not just that we were confused by the examples implying a way of working, but I monitor the NOTSCO testing and see other CPs doing similar things, based on the specification, that are going to be problems. I'm just waiting for this new check to kick off and show a CP assuming they can pick source correlationIDs for their own purposes (this did happen later in the day). In fact, looking at logs today (we only keep for a day) I already see duplicated correlationIDs that will break when sent to any CP following TOTSCO Bulletin 66.

This is a bigger issue than you realise!

We originally coded with a way of working with correlationIDs that would fall foul of any CP following bulletin 66. We changed later once TOTSCO confirmed that basically its test cases are wrong.

I am seeing now half of the CPs testing on NOTSCO hitting the duplicate test.

The whole way TOTSCO do testing is two random CPs testing against each other. That would NOT have picked up this at all. So the CPs carry on.

Then, wham, on 12th Sep, some OTS messaging breaks because one of the CPs followed the spec (which has NOT BEEN UPDATED) and one implements the de-duplication in bulletin 66.

The fact TOTSCO do ZERO formal testing against the spec is just a serious problem - that is just irresponsible. I'm amazed OFCOM allow it.

2024-07-21

Bulk ESP32-S3 programming

Programming an ESP32-S3 is really easy.

The S3 has build in USB, which means literally just connecting GPIO 19 and 20 to D- and D+ on a USB socket - not even any resistors! It operates as a USB device out of the box, appearing as a serial/JTAG port. It just works on standard USB serial drivers on linux and MacOS (and I assume, Windows).

Using the ESP IDF tools I can type.

idf.py flash

And that is it, it detects the chip, and flashes the bootloader and code.

No special leads, it is that simple.

Smaller footprint

The only issue is that this all works if you have the complete ESP IDF installed, with its python and cross compiler environment, and your code checked out and built (or able to build). This is not hard, there are simple steps to do this, but it takes a lot of space.

So, I wanted something simpler so I could make a small machine, ideally a Raspberry Pi, that just flashed code. Thankfully, all I need is esptool, i.e.

pip install esptool

And then I can flash using that rather than the whole IDF. It is more complex, e.g.

esptool.py --chip esp32s3 -p /dev/ttyACM0 -b 460800 --before=default_reset --after=hard_reset write_flash --flash_mode dio --flash_freq 80m --flash_size keep 0x0 release/LED-S3-MINI-N4-R2-bootloader.bin 0x10000 release/LED-S3-MINI-N4-R2.bin 0x8000 release/LED-S3-MINI-N4-R2-partition-table.bin 0xd000 release/LED-S3-MINI-N4-R2-ota_data_initial.bin

But that is simple to script. One tool installed and the binaries from my repository, and job done!

One device after the next

The challenge is that I want to do bulk programming - i.e. flash a device, get clear confirmation it worked, then just plug in the next device. I don't want to run a command each time.

Getting confirmation it works is easy as all my boards have an LED, usually a tiny 1x1mm WS2812 colour LED, and that starts blinking as soon as the board starts. Indeed, the code is signed and checked on boot, so if any issues flashing it won't start.

Indeed, where I have done this I have had there separate instances running and 3 USB ports and leads, so I could plug in one after the other, unplugging when I see it is flashed and running. Really slick!

What I was doing was

idf.py flash monitor

This flashes, and then runs, and monitors serial output (which can be useful if there are additional diagnostics to show, but the main indicator is the on board LED).

The problem is you then have to kill the monitor for each board (ctrl ]). Even just disconnecting USB appears to wait for device to reconnect. I created a convoluted bit of C code to run monitor, and check output, looking for the string it gets for a new device, and exit. That way I could flash, and then run this, in a loop. Works well.

The problem is that, once again, this is using the whole ESP IDF just to run the idf.py command. And it seems esptool does not do a monitor function!

My own monitor code

In principle it is really easy to make my own C code to open the USB (serial) port directly, and set DTR and RTS appropriately to reset the board in running mode (rather than bootloader mode).

This worked perfectly on my Mac. Some simple code, waits for the right string to indicated a new board, and exits. It also does not need the whole ESP IDF to run.

But no!

  • The first issue is that the ESP32, with no code loaded, seemed to trip the power on the USB port. It is odd, and maybe the regulator I am using creates just enough of a power spike, or something (never bothered my Mac), I don't know. The fix was a powered USB hub.
  • The next issue is that once code is loaded, even with a powered USB hub, it seems the start up with WiFi is enough to then trip the power, so it constantly resets and does not blink the LED.
  • I finally found a power hub that just works with linux.

But there is weirder!

The other weirdness was that on the raspberry Pi, it seems it would not play properly with RTS and DTR and constantly came up in bootloader mode regardless. I simply could not get it to play, it was like DTR was not being set. The only difference seems to be it is using an OTG serial driver. On two separate bigger linux boxes, using a different driver, it works as expected (and ends up in a boot loop, as I said above).

I don't know how one can change the serial driver on a Pi, suggestions welcome (google did not help me).

2024-07-20

TOTSCO - the top level - ordering

This should give you some idea of the issues with a simple matter of providing a broadband service. Bear in mind the broadband service may have a linked telephone service - i.e. be ADSL or VSDL on a phone line, and the customer may, or may not, want that number to carry on working some how.

It used to be we could take over the broadband and leave the telephone alone, or, we could take over number and broadband as a BT line, or we could take over broadband and port the number to VoIP.

It is more complicated with the retirement of old fashioned phone service - we cannot move the line to broadband with us on a telephone line any more, we have to move to something called SOGEA or SOADSL, which is a broadband service with no telephone service on the line. So we have to offer customer choice to lose number to move to VoIP.

So lets look at some of the combinations we have to handle, and do One Touch Switching for...

  1. It could be a service that is totally different, like Starlink or something - we provide new broadband and OTS co-ordinates the cease. Simple.
  2. More likely, BT/Openreach broadband and BT/Openreach phone service using a BT number range number. Yes, that specific set (regardless of resellers, which may not be the same for broadband and telephone) is special as we can do an integrated port moving broadband and porting phone as one order in to BT. As you can imagine working out it is this exact combination can be tricky, and end user may not know.
  3. Could be BT/Openreach broadband, and a BT/Openreach phone line, but not a BT number range number, in which case we migrate the broadband and port the number separately as we cannot do an integrated port.
  4. Could be BT/Openreach broadband, and MPF phone line, in which case harder to check, and we can port the number separately as we cannot do an integrated port.
  5. Could be BT/Openreach FTTP with and associated phone number which may be even VoIP, but is linked at the BT account so would die if migrating broadband. I think that has to be a separate number port, but not sure - it may allow an integrated port if a BT number range. We'll have to test that one to be sure.
  6. Could be BT/Openreach broadband and BT/Openreach phone service, but the new service is FTTP, so a separate physical service. This can be coordinated to allow old broadband to be ceased but leave phone line in place, at least for now.
  7. Could be BT/Openreach with no phone number associated, yay! simple migrate.
  8. Could be CityFibre which won't have a phone number, yay! simple migrate.

For the OTS, somehow we have to explain the options so they can make an informed choice!

Porting the number adds an extra step too, now.

  1. The OTS match for broadband using number to identify it may (or may not) come back with an option to retain/cease, or we could do the OTS with IAS and NBICS "port" request, making one "switching order" for broadband and number port, if that is offered as an option.
  2. The OTS match may or may not mention a number linked to the line, depends if the reseller of the broadband knows if there is a number and what it is - the number could be a totally different reseller. But we may be able to work out the service has a BT/Openreach number based on the broadband checking in BT. If the customer knows the number we may be able to do an integrated port on the broadband. It is not impossible for neither the old broadband retailer, nor us, to know there is a number, and then that number gets zapped - so we have to ask the customer if they are sure, regardless.
  3. If the broadband OTS does not have a number port on the same switch order, we'll have to do a secondary OTS for the number port, possibly with different retailer, for the same address. Then we have to manage and track two switch orders. We probably need to do that even for the integrated port option.
  4. Either the broadband, or the number, or both may not be able to do an OTS check if the service is a business, or the retail provider is not on TOTSCO yet, so we have to handle that.

At the end of the day, this is a couple of extra pages of stuff to fill in on our order forms for customers now! It also adds new ways for things to go wrong.

The very small light at the end of the tunnel is the telephone number porting OTS should advise the Network Provider and the CUPID which should allow the port to go smoothly. We're looking forward to testing that!

QR abuse...

I'm known for QR code stuff, and my library, but I have done some abuse of them for fun - I did round pixels  rather than rectangular, f...