2024-06-12

NOTSCO (Not TOTSCO)

I posted about how inept TOTSCO seem to be, and the call today with them was no improvement.

It seems they have two test stages...

  • A "simulator" to prove basic connectivity, well, sort of.
  • Pre production (i.e. live with other CPs but we all know it is "test data")

They seem to be missing the obvious, a proper simulator platform that can simulate communications with another CP using TOTSCO, both ways. This has the aspect that the testing is against the spec, not against other CPs and their interpretation of the spec.

Missing link

So how do we address this missing link, the platform to test TOTSCO as if talking to another CP, but without actually doing so.

Well, we, like other CPs, I am sure, made some simple test systems before going to TOTSCO. But external testing is invaluable. Even if the external systems have it wrong in terms of following the spec (as long as they will fix it), they won't have the same errors as you have. The best external test would be TOTSCO, making a proper CP to CP simulator system.

But it does not exist - so, as you might expect, if you know me, I am making it, for free.

Yes, I am making a proper, useful, TOTSCO test platform.

  • No need to book test slot, just sign up and use for as little or as long as you need.
  • Configure the responses you want to a match request.
  • Send ad hoc messages to your system.
  • Send deliberately wrong messages to test your error checking.
  • Test as you go, an ideal way to test your code as you develop it.
  • Logging and reporting messages each way, in detail, with errors and warnings detected.

From a privacy perspective, I am not expecting personal data to be stored, but we are deleting all test at the end of each day anyway.

Work in progress

Well, I had the idea yesterday, and have a test platform now. By the weekend I expect to launch this properly and fully working.

I am only posting now, as loads of ISPs have already "signed up" even without my announcing it! The code is all open source and a lot of people clearly follow my github!

It is free, sponsored by FireBrick, helping small ISPs, as we always do.

Obviously not officially endorsed by TOTSCO, well, not yet!

[You will need to sign up again when I launch it]

Update (Thu 3pm)

I now have sending match request and receiving match request with detailed syntax checking and reporting

Next is syntax checking match response, and the various follow up messages, and then sending "bad" test messages.

What is really interesting is that testing my own TOTSCO system against this new system has already found bugs. Bugs that TOTSCO would not have seen, and would only be highlighted by interacting with other CPs.

This shows so much how a system like this is necessary.

Update (Fri 2pm)

I now have everything in place I think - though there are bound to be some bits I have missed, so if you find any problems, or suggestions, raise them on GitHub.

Even today, it has highlighted errors in my OTS code.

The next step is deploying on a VM for people to have a go with it.

2024-06-10

Working with TOTSCO

This is hopefully going to help other small ISPs that will have the same challenges.

As I explained in my previous post, we have to work with TOTSCO to set up One Touch Switching. Well, we are doing that now that TOTSCO actually exists. The new deadline is September, but we want to ensure we are working well before that.

Specifications

The specifications are not too bad. They have a few inconsistencies, which I have fed back to them. But I was able to code the system reasonably quickly. I created my own test system to act like TOTSCO so I could test my code with messages in and out in advance.

The underlying system is, as I say, just a messaging process between telcos. It can use OAUTH2, which is simple, and involves JSON messages each way, which is also simple. I use C and a load of long standing in-house JSON libraries, but for most people they would use some other platform with standard JSON libraries I am sure. It should be pretty simple. Obviously the hard part is integrating which whatever back end systems and processes the ISP uses, oh, and checking data for clean address data for matching services including UPRNs.

Simulator

TOTSCO have a simulator, which is good. It will allow testing against them. It has been two weeks since I finished coding it all, and only just on the simulator, but it is a mess, so far.

  • The token issuing URL had an invalid certificate (wildcard, but one level too high). I ignored that to get further testing.
  • The directory URL did not work (404). This provides (or should provide) the list of ISPs, basically.
  • The messaging URL simply said "Error connecting to the back end".

Well, that is not a good start, but chasing up, after several days they finally want me to check I am using the correct URLs. Good thing to check, but I was, as per the spec.

  • They fixed the token certificate, good, but the reply did not say they fixed it. The new cert now uses a different CA that libcurl does not know, or some such, which is fun. But at least is valid.
  • They told me to use the directory path but on the token issuing host, which makes no sense. Re-reading the documentation it certainly implies the directory URL is an "API" and so you would expect to use the API host. So that is weird. But it still did not work (404 Not Found). I eventually found it works if I add the optional parameter &identity=all. Well, it is meant to be optional, and is a GET form style argument, so how it was giving 404 is beyond me. Interestingly, with that, it works on token host and API host, so even weirder.
  • They told me to use a path for the messaging that starts /testharness/ which is not as per the specification (which states /letterbox/). So basically the simulator does not follow the specification! Using testharness gets further but a different error this time.
  • Oh, and the directory I get has RCPIDs (Retail Communications Provider IDs) which don't meet the specs, so, of course, my code barfs trying to put them in the database which was set for 4 characters, as per the specification. So again, the simulator does not meet the specification.

Some progress

Well, surprisingly, we have a quick response now.

  • They say that the duff RCPIDs are dummy entries. OK, but surely they should at least have correct syntax, as otherwise it is sensible for my end to reject them.
  • They just say testharness should work, but I have to use specific RCPIDs for testing, good (would be nice if that was documented, maybe I missed something). But they really need to fix it to actually follow the spec and use letterbox.
  • I got as far as testing a match request and them trying to send a reply. They get an OAUTH2 Bearer token, and then try and post a message, but the message they post does not use the same bearer token I issued to to them, so is rejected.
  • I can see what they tried to post and it does not have the right source and target RCPIDs or correlationIDs, so again I would reject them if they actually authenticated.
  • Oddly, after more tests, they are using the right bearer now, but still wrong IDs
The irony here is that part of my coding was to make a simulator for my own testing before going to TOTSCO, and so far my simulator is way better than theirs!

Next steps

I have come to the conclusion that the simulator is actually useless. It does not simulate either the TOTSCO messaging platform (as it does not actually use the right URLs, or provide a sensible directory, or actually do OAUTH2) nor actual end to end messaging (as it does not do source/target RCPID or correlationID correctly).

What really puzzles me is that we know we are not the first to do this, and we know some of the big telcos have done this. So how have other ISPs not ripped TOTSCO to pieces over this stupidity already?

Follow up call

We have had a call. They explain that the simulator is totally dumb, it cannot be told to initiate any messages, and all it does it send one of two fixed replies to a match request (depending on the RCPID to which it is sent). It is meant to test connectivity.

But they want to do more than just two match requests and replies, they want us to send the order, update, tigger, and cancel requests.

This makes no sense, as the match requests test connectivity both ways already. And, of course, my system will not do that as it has not received a valid switch order confirmation reply. The fixed text they send is not valid as wrong RCPID and correlationID, so we don't accept it and don't store the switch order reference. And as such it does not see a switch order we can place or update or trigger or cancel.

I could fake such messages, but that is not testing my system.

They say that if I email explaining this, they will move to pre-production platform. The is the same as live, but with other CPs.

What they seem to lack is any sort of useful simulator that handles messages both ways as if to another CP. This would seen a sensible step before going to pre production testing.

2024-06-07

One Touch Switching

OFCOM have come up with a few things that are perhaps a tad questionable in terms of their benefit or practical application (in my personal opinion, of course). Sanity checking CLIs is one which created back scatter and broke useful services, but putting that aside, the latest is "One Touch Switching".

So what is it?

The concept seems relatively simple - a residential/consumer with a fixed location broadband (i.e. internet access) or telephone ("Number Based Interpersonal Communications Service") should be able to easily switch to a new provider. They should be able to do it as "one touch", i.e. their one order with the new provider.

Does this make sense?

Well, maybe. From a consumer point of view, for many people, the fact that moving from one "Openreach back end" broadband provider to another is different to moving from one technology to another, and may be confusing. Fair enough.

It is different for a reason - if you have a broadband service provided over Openreach based copper (or worse, aluminium) wires, you can change provider by the new provider working with Openreach to change what is attached to those wires and the ISP to which it is routed, and pretty much seamlessly move from one ISP to another. Of course ISPs vary, some don't even have IPv6, and some use CGNAT, and some filter or log stuff, so not really "switching", but OK.

But if it is a different technology, e.g. moving from VDSL on wires, to some radio (WiFi) service, or Starlink, or Virgin, or mobile, or, well, anything that is a different technology, the process is different.

But it is not complicated! It is order new service and cease old service. If you have any sense you arrange an overlap to ensure new service works well for you before old service stops, as it its not the same "wires". (and no, you cannot easily arrange that overlap now!)

OFCOM could have mandated that "ceasing" a service has to be simple and easy. That would have made any change of technology simple. They chose a different path.

How does it work?

Well, that's another problem, as OFCOM said to "industry, you have to do this", and expected something magically to happen. It did not, and has been delayed. Eventually some new company called TOTSCO has been created that is co-ordinating it.

This new system is simply a way for one telco to talk to another, with some quick, well defined (ish), messages to handle the process. Spoiler, it is JSON!

Basically the new provider ("gaining" provider) messages the old provider ("losing" provider) to match a customer and address, and if that all works they can start, and then later finish, a "switch". Old provider is expected to email customer with any early termination charges and stuff, good.

What it does not do?

It does not actually change the switching, migrating, or porting systems in place now. It simply adds a new layer.

If the process involves some migration or porting that happens the same as if ever did. If it does not, e.g. changing broadband from Virgin to Starlink, all its does is coordinate the cease of the old service when the new one starts.

More work for customer!

Our broadband provide and migrate order forms are complex enough, we have to know exact address and what service we can offer, and if migrating from another Openreach service. But now we have an extra layer on top to match the service from the old provider. It saves the customer ceasing the old service if it is a change of technology, but if a migrate then it makes no difference, just adds more that we have to ask and more that can go wrong.

But for some people it may help, especially if ceasing an old service would be hard work. Some ISPs seem to make it hard work. So some good, maybe.

It seems to also stop most "anti-slamming" measures - not allowing losing ISP to cancel a migration now!

The old systems still needed!

However, the new system is only fixed location internet access or telephony, and only consumers. Anything else still has to work as before, business services, and services that are not fixed location. And even for the cases the new system applies, the old systems to migrate and port are still needed to make it happen.

Some hope?

Maybe, just maybe, number porting, which seems to involve a lot of manual work now, could be improved using some new messaging system used for One Touch Switching. If so, that will be good.

The issue here is many VoIP services are not "fixed location", so outside the scheme. We have had lots of issues with people porting numbers to us where the "address" did not match, when in fact the losing providers idea of "address" is years old before it was moved to VoIP. The new system simply does not apply to non "fixed location" services, so that will be no help at all. A system like mobile ports, using a "PAC", may be way better, and not location dependent.

For us, porting a telephone service, from a fixed location, it may help, as it may confirm address match and confirm losing access provider, so ensure porting (which still has to use the same old system) may be more reliable. We hope so.

What's in a surname?

I mentioned a lack of any means to avoid "slamming", forced change of ISP/telco. This could be someone hijacking customers, or some end user being malicious and migrating someone's service for fun our malice or fraud.

The one thing the new system expects is a match of surname. They have a cryptic requirement to remove accents, but that is messy, depending on language and alphabet, simply "removing" an accent is far from "equivalent" to non accented. But we have done that in a crude way. But we do have to match surname.

So we have allowed customers to set the surname on their broadband services. This is not for VoIP as our VoIP is not fixed location, so will never match for One Touch Switching anyway, and needs old school porting out.

What I have now put on the web site re slamming is:-

For a long time we have operated an anti-slamming option where you tell us in advance that you do not wish your broadband to be migrated to a new provider. You could then change that at any time.

However, the new One Touch Switching system works differently. We will no longer be able to reject switching. However, to start switching the new provider needs an address and surname to match. They can start a switch process in BT without, but this is less likely as the normal process for consumers, and probably most businesses, will be One Touch Switching.

Because the surname has to match, we now allow you to edit the contact name on each line you have with us. Your name is what you want it to be, so picking any name for any circumstance is your right, and we have to respect that and allow you to change your name under GDPR, even if only on that very specific part of our system - the contact name for a broadband service.

If you change your surname, even if it is to PSJKHGJGEXC, then that is your choice. And any One Touch Switching match request would fail unless using the surname PSJKHGJGEXC.

Obviously this is meant to be for your surname not really as a pseudo password, but, well, it is up to you.

2024-05-30

Hot tubs are expensive (again)

Yes, my hot tub is expensive.

Our whole house total power consumption was, typically, 55 to 60 kWh per day. Which is a lot. I have some excuses, servers in the loft, air-con for heating and cooling in various ways, and, of course, the hot tub.

The average hot tub usage 20kWh per day.

Simple change

The simple change anyone can do is insulations. The hot tub bucket has some foam coating stuff to insulate, but there are a lot of pipes connecting and holding (hot) water. These are inside some simple panels, and are not insulated.

The first surprise was how much difference the panels make. They are just thin fibre board of some sort, not obviously designed to insulate.

This is previous normal hot tub power profile :-

As you see, it is high when heating in the morning after being off all night, and when in use, but when idle is around 25% duty cycle maybe.

We removed the panels (to help turn it around, and ready for installing a heat pump). This was a surprising difference :-

The duty cycle, when not in use, when idle, was more like 75% or more. I emptied and refilled the tub from cold and it took 24 hours of full power to get to temperature. Yes, the lid was on.

This shows the side panels make a massive difference!

You can see why!

So what is the simple fix?

Lagging, multi layered loft insulation in fact, and a lot of silver tape, and quite a few hours.

The problem is I don't know how much this has helped, but it was done on the same day as the heat pump conversion - two changes at once. But it is a cheap change and I bet it helped a lot.

I should have done this years ago!

More expensive fix

The more expensive fix is a heat pump conversion. I spent £2299 in total on heat pump and installation.

It took a few hours...



It works by sitting in line with the circulation pump and with the internal resistive heater disabled (it actually has a relay to allow it to be used if really too cold for heat pump to work). The heat pump then operates whenever the circulation pump is on, leaving the hot tub to control temperature as normal, thinking it is working the resistive heater.

So, what's the difference

Firstly the power usage is way lower, the total for heat pump and the circulation pump, is around 1kW. Before it was 3.5kW. The other change is the duty cycle, which was lower. But I cannot be sure how much is down to heat pump and how much is down to insulation.

One big statistic is heating from cold, after a change of water.

  • With no side panels, resistive load 24 hours at 3.5kW. So around 84kWh.
  • With side panels, resistive load, back in January, 12 hours at 3.5kW. So around 42kWh.
  • With side panels, insulation, heat pump, 6 hours at 1kW. So around 6kWh.

So what doing the bigger stats say?

Average usage for May, 43kWh/day. I am seeing examples as low as 30kWh/day though. It seems the whole exercise has saved maybe 15kWh/day. But May is disproportionate with over 102kWh of tumble drier not a normal 42kWh due to someone having a broken bathroom :-)

It also means I am now regularly making enough solar, with battery storage, to run the house on overnight charge only, and have next profit on export, even in May, even on some gloomy days!

Last week's total electricity bill was 41p.

2024-05-22

ISO8601 is wasted

Why did we even bother?

Why create ISO8601?

A new API, new this year, as an industry standard, has JSON fields like this
"nextAccessTime": "2023-May-18 04:43:00+0000 UTC"

I mean, pick a lane, why "+0000" and "UTC"?

Why "YYYY-MName-DD" FFS, that is not *any* standard in RFC or ISO?!

I just don't know how they could have come up with that in any sane way.

The xkcd "cat" format would be saner!

(FYI, it is TOTSCO)

2024-05-12

Debugging

There are lots of ways to debug stuff, but at the end of the day it is all a bit of a detective story.

Looking for clues, testing an hypothesis, narrowing down the possible causes step by step.

It is even more, shall we say, "fun", when it is not definitely a software or definitely a hardware issue. Well, to be honest, we know it is hardware related, but it could be hardware because the software has set something up wrong, or is doing something wrong, maybe. Really a processor hang should not be something software can ever do no matter how hard it tries, in my opinion, but in a complicated system with complicated memory management hardware, it is possible that a hang can be the side effect of something wrong in software.

I was going to say that "when I was a kid, software could never cause a hardware hang", but I am reminded not only of the notorious "Halt and Catch Fire" accidental processor operation, but that one could walk in to a Tandy store and type the right POKE command on one of the earliest Apple machines and turn it in to toast, apparently. So maybe there has always been this risk.

The latest step in the "watching paint dry" process of trying to diagnose the small issue we have with the new FireBricks is underway now. It has been a long journey, and it is too soon to say it is over. It will be an awesome blog when it is over, honest.

One of the dangers with software is the classic Heisenbug: a bug that moves or goes away when you change something. We are chasing something which, by our best guess, is related to some aspect of memory access. This means that even the smallest change to software can have an impact. Make the code one byte shorter and you move all the interactions with cache lines when running code, and change the timing of everything as a result. When chasing a big like this, you cannot rule out those being an issue. So a change of one thing may result is a change in behaviour somewhere else. We have seem a lot of red herrings like this already.

The latest test is unusual for us. It is a change to an auxiliary processor that controls a specific clock signal to the processor before the code even starts to run. One we don't currently need. And we are removing anything we don't need, no matter how unlikely it is to be the cause.

What is fun is that this means we have not changed a single byte of the main code we are running.

If this works, and only time will tell, we can be really quite sure it is not some side effect of simply recompiling the code. It pretty much has to be the one thing we really did change.

Being able to test something so specific by a software change is quite unusual.

Data packages

Our old SIP2SIM was "pay as you go", and the new one has monthly capped data packages.

To be honest, people have asked for this for a long time, but as ONSIM are selling us data packages, it makes sense to do the same, at least for now. Monthly 2GB, 4GB, 10GB, 20GB, 40GB. It is also more sanely priced than before.

But, of course, it is not simple.

So, for a start, adding data to a non data SIM, mid month, is a pro-rata data for rest of month at a pro-rata price. So far so good.

But what of increase of data package mid month. My thought on this (and it depends on ONSIM), is we update to new monthly, pro-rata if data started mid month, to new package, and the same for price. Mostly it will be an increase for whole month to new monthly rate and the difference in monthly price.

But what of decrease? Well, I guess, maybe, the same logic could apply, but only if you have used less than you would now have for the month. My thought it no, lowering the package is setting a new lower level for next month. This is far simpler, and no billing implication and no change to this month.

Of course if you then increase again, we have to allow for the fact that this month you are on a higher package than you will be next month, and only consider it an increase relative to that.

This is never simple, is it.

Hopefully we have something soon, sorry for the delay, waiting on ONSIM to do the necessary APIs for us.

NOTSCO (Not TOTSCO)

I posted about how inept TOTSCO seem to be, and the call today with them was no improvement. It seems they have two test stages... A "s...