2024-07-12

TOTSCO Integration testing done

Well, we managed to complete tests with one CP yesterday, with us giving them a lot of hand holding.

But separately, today, I had a 90 minute teams call, with TOTSCO, and another CP, and, well, that was it!

We each had a challenge of forcing our systems to create scenarios - faking installations happening when they had not, and so on. And then forcing errors when errors should not happen. That is why it was 90 minutes.

I learned a few things - for example I had one system sending a datetime not a date, in one specific case we had not tested before, which meant they rejected the message (quite correctly), well done. They too found issues as well, all minor, and sorted on the call.

It was nice working with people who had basically got a fully working system, and it was just slight tweaks and edge cases - polishing things a bit - which they were able to sort on the call. All very professional. Thanks guys.

But that was it, a 90 minute call, and TOTSCO confirmed all tests done each way and all passed integration testing.

Next step, production testing, which is ONE MESSAGE... Oooh, I'm scared!

2024-07-09

TOTSCO - making it work

We (A&A) are trying to make One Touch Switching work, honest.

Given the many posts I have done, and that I am clear how I feel it is a stupid imposition by OFCOM, and badly implemented, you may think I am not trying to make it work. But I am.

Testing

I have created a complete proper testing platform, because TOTSCO don't have one. But I have been monitoring, and finding issues that other CPs have. The use of "VOIP" for "NetworkOperator", instead of "A000", fooled me, and I have confirmed the TOTSCO spec is wrong, as is the example in the spec. I asked that question, got an answer (even though they will not update the spec) and have added details and notes on NOTSCO so testers can see the issue and why it is there. I am constantly working, every day, to ensure my test system is right even when that means getting the broken specification clarified.

Helping "buddy CP"

We are working with a CP. We will work with more. We had been told by the boss of the CP that they were ready, and would be happy to test with us. To our (and his) surprise, they are not close. So I have done a lot of hand holding here. We found more errors in the specification as a result. My view is we are 100% ready, and have been for a month. But getting through TOTSCO's insane testing process is painful and broken. And no, TOTSCO are not paying us to hand hold, and test with other CPs?!

Fixing the specification

Every error we have found, we have told TOTSCO, and tried to get them to fix the spec. They think a change freeze is a good move - it is not! We tried. So the issues are documented on NOTSCO test system to help other CPs understand.

Going live

I am not convinced at any of the other testing stages to be honest.

What would I do (ramp up)?

I would have CPs going live, but making clear to customers as part of the order journey that this is not all CPs yet and they can choose not to be part of it. If they say yes, offer choice of CPs that are on-line.

This means real migrations and ports with real CPs, increasing as each CP goes live. The deadline would simply be for when all CPs are live.

Babysitting

A key aspect I have included in our system is babysitting. I personally (as the lead developer) am being notified of each OTS message, so I can review it. A larger CP may need a team to do that.

We also make it optional for customers - we have to - they may not want to cease an existing service at all, so the process has to be optional. But it means any issues also allow a more old school migration or new service instead.

This idea is simple - there will be edge cases - there will be errors - we will have errors - but we can proposed changes and review them and make them live in an agile way. With NOTSCO we can test those edge case errors. Indeed I can add test cases to NOTSCO for others as a result.

We lack contacts to other CPs, this will be fun when live as I will raise with TOTSCO every weird or wrong OTS message we get from another CP and asking them to put us in touch with other CP. What would have been neater would have been to provide all CPs with an email to query OTS issues during deployment. Why is that not a thing?

Ideally we would find all issues during a ramp up, and if that meant changes to TOTSCO specifications, they would be done and notified quickly.

We can make it work!

We can, I am sure, but it is being so badly managed right now (in my honest opinion) it will be a lot more work than it should be.

TOTSCO change freeze

You are approaching a deadline, one that is legally important.

Hundreds of developers are working to meet that deadline. They need to interoperate on the impending deadline.

Your specification has errors, contradictions and vague definitions.

You choose:
  1. Encourage queries, make clarifications, advise all companies of these clarifications and updates in a timely agile way updating the specification.
  2. Change freeze the specification so all companies make their own mind up, get confused, and reach deadline in an incompatible way.

Which did #TOTSCO choose?

2024-07-05

TOTSCO, telcos, a little help please!

There are 47 other  companies on the TOTSCO pre-production platform right now. We have been waiting weeks for a buddy CP for testing. We'd love to get more testing with anyone.

Can any one of you spare a few minutes to do some testing?

You don't need to book anything with TOTSCO for this, if you are on pre-production platform, we can exchange messages, and if we exchange the require messages we can complete this stage of testing.

Try a match request to us maybe? I have set up a line on our system, with these details, for testing.

Service:    IAS
RCPID:    RVWJ
Surname: STARMER
10 Downing Street
LONDON
SW1A 2AA

That should get you a valid match confirmation. Try with surname SUNAK, and you should get a match failure. [I thought it more amusing that way around]

If you get a confirmation, do send a switching order, update, cancellation/trigger as well.

Thanks.

Update: We got a test pretty quickly, which is nice. I got the postcode wrong initially, D'Oh, but the match request included an account number and UPRN. It is a concern that they were included (I did not post a UPRN initially and what was sent was wrong). It suggests the sender expects to send an account and UPRN in all cases, when neither should be mandatory. So interesting test, thank you very much for that.

Comment: Yes, that is all you need to know if there was a broadband service you wanted to port to a new provider under the new system.

Update: I did not include a UPRN or Account number as I would hope CPs can cope without these from a customer. We cope without them in matching. But as TOTSCO don't define it, we also cope if they are present but an empty string!

TOTSCO Tick Tock

I may sound like a stuck record, but I learn more as I go, so updating on this seems sensible. I hope it helps other CPs.

TOTSCO do publish the test process, here. But I'll summarise.

  • A simple connectivity test, to check connectivity, and send some dummy message responses.
  • Integration testing with another CP, exchange a number of messages of different types.
  • Production Implementation testing.

To summarise the problems so far.

Simulator

It is meant to do basic connectivity tests, but did not actually pick up the one issue we had that we were too slow responding (a simple apache config tweak fixed). So failed in its one job.

We could not complete tests as the responses were not valid. Heck, we had to bodge things to even send a message as the simulator does not meet the spec for the URLs used. This meant we would not then send messages to progress a switch order, as they wanted, because we had (correctly) rejected the invalid messages they had sent, and had no switch order to progress. Thankfully that step was not mandatory.

Integration Testing

This purports to be more comprehensive testing, but it has a lot of issues.

  1. It is testing with a buddy CP, but it seems it can take weeks to have one assigned, at random. We short circuited this eventually be agreeing with another CP we know. But they were not ready. What is worse is that we now know that at the same time as we are waiting weeks, other CPs are as well, which makes no sense?!?!
  2. The buddy CP is doing testing with us, for free(!), if they are ready. They may not be ready. Even if they are, we are doing tests against their interpretation and implementation of the OTS specifications. It is not testing to a reference implementation or against the specification.
  3. They want us to do all 15 message types (plus the 16th messageDeliveryFailure). One issue is that this is contrived. Our system is design to send valid messages as part of a switching process, integrated in to our management systems (with just the tiniest tweak for testing to not actually action a cease or migrate at that key point). This means getting failure response to some messages will not use our normal system, because our normal system would not send incorrect messages.
  4. They then want at least 1,000 messages - why? These are not real switch orders. It will literally be a repeated match request sent 1,000 times. A totally pointless step.

I have had to make the system allow me to send bad messages in some ways in order to get the Failure responses. This means I am not testing the actual OTS system we have made, I am bodging it! If there was a proper test system, one could set up the bad responses even for valid messages so as to test, and to generate bad incoming messages to test error checking. But if you have two CPs that have set up systems correctly, they would not generate bad messages and therefore prompt error responses as a result. You actually need a buddy CP that is set up to deliberately do testing, for free(!).

This is where we are still - I asked for another buddy CP a few weeks ago, and no joy yet, but the original CP may be closer to being able to do basic tests now. I hope so. It could allow us to finally get past this step.

Production Implementation testing

This is the final step before able to go live.

  1. We have to book a test slot 8 weeks in advance. Why in the name of sanity would we have to do that? I mean if the test slot meant tying up TOTSCO staff for hours to go through a series of complicated tests, I could understand - but this is the kicker...
  2. The test is one message exchange. Just one. I don't see how this even takes up TOTSCO staff time. It should not. It could be automated - I fill in details - I send a match request to BT or someone, and get a response, done. Why on earth is a test slot even needed in the first place, let alone booking 8 weeks in advance.

2024-06-29

TOTSCO correlationID

RESOLVED! See below!

My latest concern is understanding TOTSCO specification. This may be that I have mis-read or not read enough. I am fully prepared to accept I have this wrong. It came up because the buddy CP and myself read it differently.

Messages each way have a source and destination correlationID. This is necessary to allow a response to be correlated with a request. An initial request does not need a destination correlationID (indeed, should not have one), but needs a source, and the reply needs a destination correlationID matching that source (and arguably maybe not a source of its own, expect it is mandatory §2.1.5, except it is not §2.1.8).

My initial interpretation was that each message type that was a Request would have a response that is a Confirmation or a Failure. And that the Request/Confirmation or Request/Failure would need matching correlationID so the response could be matched to the request, but that was it.

Indeed, all of the messages and responses that progress a switching order also contain a switchOrderReference, so no actual need for correlationID at all anyway in those.

My code would send a Request and wait for a response, using the correlationID to match the response. This is synchronous in the customer order process where the SLA for a match request is 60 seconds. We make the customer wait for the response up to 61 seconds.

But then I saw the published TOTSCO test cases, and they all had a destination correlationID for the ongoing messages, the residentialSwitchOrderRequest, for example.

This only made sense if the whole sequence, such as the following, were all a single message flow with a consistent set of correlationIDs each way for the whole sequence.

  • residentialSwitchMatchRequest
  • residentialSwitchMatchConfirmation
  • residentialSwitchOrderRequest
  • residentialSwitchOrderConfirmation
  • residentialSwitchOrderUpdateRequest
  • residentialSwitchOrderUpdateConfirmation
  • residentialSwitchOrderTriggerRequest
  • residentialSwitchOrderTriggerConfirmation

If that is the case I have to hold correlationIDs much longer, and associate with ongoing switch orders. I spent many hours re-working the system to do just that. This had issues with the possibility of delayed/repeated messages, which can happen. A reply may be to an earlier message with the same correlationID. I'd far prefer the previous interpretation where each Request has a new and unique correlationID which has to be quoted in the single corresponding response (Confirmation or Failure). It would be simpler and easier. But the test case examples make it clear that this is not the case, which is messy and a lot more work.

I have now asked TOTSCO to clarify. I have not had a reply yet.

So, even though I did all the extra work, I am happy if they come back and say it is for each message pair distinctly. But they must update the specifications and examples and test case to make that clear, as it is a lot more work to track these over a complete (multiple days, weeks) switch order process than over a simple message pair.

For now my code does both - it tracks and uses consistent correlationID for the whole sequence of messages, but accepts new correlationIDs for each part of messages if that is what we get.

Update: "The specification does not call for either option to be a requirement, but our expectation and the behavior [sic] we have seen so far in testing is that the second option is being applied by users. There is nothing to stop a CP from wanting to use the same correlation ID throughout a whole switch journey, but the important thing is that they cannot expect their counterpart CP to follow the same behavior [sic]."

This is typically not helpful. If even one CP can expect / require the destination correlationID for a residentialSwitchOrderRequest to be their source correlation ID from previous residentialSwitchMatchConfirmation then that means all CPs will have to track correlationIDs through the sequence else they will not work with that CP. If a CP cannot expert / require that, then no CPs need to do that. The spec needs to say one way of the other. Saying "The specification does not call for either option to be a requirement" is a useless response!

Update: Finally a straight answer - I wasted a day making my code work the same as the test cases, FFS.

"We would like to inform you that, according to the specification, a switch order request is not seen as a response to a match confirmation. Additionally, the TOTSCo hub does not require users to include a destination correlation ID in any request message."

2024-06-28

Will TOTSCO be ready?

The One Touch Switching should be live 12th September. Will the "industry" be ready?

I am not sure.

We are on the pre-production platform now, doing integration testing. There are 47 CPs on the system, including us. And yes, please, any other CPs on there try sending us match requests. And if you need more testing try https://notsco.co.uk/

So I tried sending a match request to each.

The responses were interesting. A lot did respond, which is good, but what is fun is the range of different errors. This is a reflection of how badly the specification has been written. All should have failed to find any service for the name at the address. But the actual error codes and error texts varied a lot. If the specification was good, the response would have been consistent. It is not. Fun!

Quite a few did not respond, fair enough, they may only have their pre-production on line for testing.

Some failed with delivery timeouts, and one with an invalid API Key!

I really am not sure this will all be working. I mean, I think we are 100% ready according to my reading of the spec, and if I have the spec wrong, I am 100% confident I can address that within minutes. But I am not sure of others.

My biggest mistake today was finding apache had a weird 5 second delay. Seems I am not alone if you google that, and a simple fix for it (Content-Length). The CP we are working with may have the same issue, but I am not sure they have the means to debug at the right level to see and resolve it. I'm glad we fixed this, and embarrassed it was wrong.

What is fun is today TOTSCO also failed to meet their own SLA on response times to messages. No reply on that yet.

But all of this is "nuts and bolts" of messaging, and nothing close to the high level issues I fully expect to stem from the whole system. CP to CP messages going wrong has a whole new level of possible issues, and I am not sure we are close to tackling those.

Wow, and one replied after 4 minutes, and replied twice!!! (the SLA is 60 seconds). Their reply had incorrect auditData, and incorrect content in the payload!

TOTSCO Integration testing done

Well, we managed to complete tests with one CP yesterday, with us giving them a lot of hand holding. But separately, today, I had a 90 minut...