For anyone trying to work out why Microsoft Exchange Push notification message responses are not being accepted by the server, it has taken me a while, but it seems to be that it does not accept a "chunked" response.

We were sending a response from a CGI script from apache, and that is normally chunked.

But there is no way to guess what is wrong. Lots of examples on the Internet, but none worked.

We were sending text/xml with :-

<?xml version="1.0" encoding="UTF-8"?>
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<SendNotificationResult xmlns="http://schemas.microsoft.com/exchange/services/2006/messages">

I tried every combination of xmlns tagging and all sorts, eventually solved the problem by sending with a Content-Length rather than chunked. That is what it wanted.


But I guess it is in line with the rest of the documentation, which is pretty crap (in my opinion), e.g. one page listing a field as URL in one place and Url in another, and we had the wrong one. You just have to guess what is wrong half the time.

Anyway, this blog is for those looking for Wisdom of the Ancients thanks to XKCD


BT and their wifi adverts

BT have made some interesting claims regarding their WiFi, with the latest being that it is the "most powerful".

Now this is a rather odd claim as "power" is not something that is all that relevant - power (measured in Watts) is not that helpful as a measure of WiFi, indeed many smaller APs with lower power can (I believe) provide a better coverage and performance. Saying you have the most powerful WiFi is like saying your house has the brightest street light outside it. The main impact being it will make other WiFi nearby worse.

BT have made all sorts of claims before, all of them (in my view) rather suspect. The claim of most powerful WiFi, when WiFi is a radio data system to strict international and national agreed standards, is rather odd. The WiFi will have the power within the standard and legislation, like any other WiFi. It cannot in practice be more powerful.

They even published a document to justify the claim (here).

First issue: "The BT Smart Hub has superior specifications than the routers of all major broadband providers". So only most "powerful" if you ignore the smaller providers. They only look at "major" providers. AAISP have been offering Unifi APs and packs of multiple Unifi APs for some years now, but that does not count as not a "major" provider.

Second issue: The comparison compared many things but not one of them was in fact "power"! They state: "The most important aspect of wi-fi for customers is their Transmission Control Protocol (TCP) throughput.". Whilst this is actually a quite good metric, it has absolutely nothing to do with the claim of being most powerful. Power is in Watts and is not a measurement of download speed.

Third issue: They actually tested the WiFi. This is good as that is what they are claiming is most powerful, but they are selling an "Internet Access Service" using this. The tests are nothing to do with Internet access (you can tell from the speeds they measured) and for most people any speed on the WiFi that is over the speed of their Internet Access is irrelevant, so no help. Yet the advert is to sell Internet Access, not simply WiFi APs.

Basically, they are simply claiming they have a good 3x3 antennae single AP WiFi system they sell/provide with the Internet access system they sell and that it is somehow more "powerful" than other ISPs. They ignore the other (smaller) ISPs selling systems just as good. They ignore those selling multiple access point solutions which are better. They ignore all of the non ISPs also selling this equipment. And they ignore that the actual "power" is the same on these devices and their claim of most "powerful" is not actually about "power" at all but TCP throughput.

Anyway, yes, consumers want an Internet access service that is good. If BT are "most powerful" in that, why are many ISPs (including AAISP) way higher in ispreview's list? (here)


Connecting to @AAISP Fast?

Over the last week or so we have had a lot of custumers that were with "Fast" wanting to sign up with us urgently because they, rather unceremoniously, dropped all broadband customers overnight!

The good news is that most of their customers are using a back-haul provider we use, and that means that using our login on their existing line "just works".

There may be exceptions, but we have connected loads of ex-Fast customers same day, once they order we allocate a username and password and allow them in. Some are on-line within minutes.

We had to tweak the systems we have to allow this. Normally people can only log in once installed. But with some complicated work on the RADIUS server, we managed it.

Once the day comes to actually migrate there may be some downtime - some re-jumpering, etc. But overall this is a bonus for all those stranded customers. Getting on-line before Christmas matters.

It did rather put a strain on the staff just before Christmas, and even managed to upset my system for attack management that thought someone was pushing automated invalid orders at us! Soon sorted.

But, as ever, we'll aim to offer an excellent service to those customers moving to us, and ensure it all works over the holiday period.

Merry Christmas to all our new customers.

Director of A&A


Good news for privacy - Investigatory Powers Act vs CJEU

As reported by the BBC, the European Court of Justice has made a ruling that could seriously impact the powers in the Investigatory Powers Act to collect data on everyone in the UK.

The IP Act has provisions, much like the Data Retention and Investigatory Powers Act (DRIPA) it replaces, and the Data Retention Directive (DRD) before it, to retain data about use of communications systems.

The IP Act actually pushes this much further - previously telcos/ISPs could have been asked to retain certain data they processed (e.g. telephone itemised billing records) but could not be required to actually generate data they were not processing. The IP Act allows much more and it has been made clear that the government wish to log usage of the Internet in some detail - down to the level of recording every web site everyone has accessed. This is far more than just retention of data, and would apply to everyone, even those not suspected of any crime.

The good news is that the ruling from the CJEU is that this sort of mass retention of data is not consistent with our basic human rights and EU law. These apply regardless of whether we leave the EU or not.

The BBC article is not ideal in its analysis, and Open Rights Group have a much better analysis (here).

Retention is an invasion of privacy

The key point of argument here is that the UK Government considered that indiscriminate retaining of data should be allowed as long as access to that data was restricted and controlled in a suitable way. However, that is not the case. The court ruled that indiscriminate retaining of data was simply not acceptable. You have to be much more specific about whose data is to be collected to target suspects in a crime.

Only to be used for serious crime

The court also looked at the issue of controls over access to the retained data. Again, this did not go well as the access has to be restricted to only serious crime. The IP Act tries to even redefine serious crime to include things that are not serious, so that will have to change too.

Proper independent authorisation of requests for data

On top of that - the access to the retained data should be approved by an independent body, such as a court, and not simply by the current system of a Designated Senior Officer. This could finally mean we see proper court warrants for access to retained data.

No more secrecy

As I have long said, the secrecy around data retention and collection of data is not really acceptable. The ruling says subjects of access should be told about it once there is no longer a risk of prejudice to the investigation.

We can still catch criminals

None of this stops wire taps (or the Internet equivalent) on suspects in serious crime, set up and accessed with the proper controls. All it stops is the indiscriminate logging of everything we all do on the Internet - and that is a good thing - we are all meant to be innocent until proven guilty, after all.

Read more

Read the ORG article for a lot more useful insight in to this ruling.


IEC18004 QR Codes

I said I had my mojo back :-) Yesterday afternoon I decided to have a bash at writing a QR code encoding library, from scratch.

Yes, this is re-inventing the wheel as there are QR encoding libraries out there. It was fun, and it is always nice to have source code that is ours, especially if we may put it in the FireBrick (I am looking at making the TOTP logic in the FireBrick a lot easier to use).

Thankfully Cliff had already written a Reed/Soloman ECC generation function for me, and has made me a very simple BCH coding function. Whilst I understand Error Correction Code, it really is just beyond me in terms of the maths.

I found a copy of IEC18004 on-line. You normally have to pay for a spec, and I may do so at some point, but the court ruling on reading stuff on-line using your browser makes clear that I am not breaking copyright simply be reading it in my browser - whoever is hosting it is. It is 118 pages long!

What really annoys me about this whole specification is the tables of numbers. Instead of saying that the alignment marks will be evenly spaced with spacing between 16 and 20 units starting on unit 6, or something like that, they have a table that states the positioning for each. I played around and worked out a simple algorithm to work out the table and so did not have to use the table - yay. I double checked my calculations only to find one barcode size does not follow the same logic and is a special case for no apparent reason. Why not just make it a simple algorithm?

You then have the same for the level of ECC coding - rather than say "medium ECC uses X% of the space for ECC words" and work that out for each size, there is a table, for four different ECC levels for 40 different sizes of barcode. Then the number of blocks used for ECC is not something simple like "use more blocks when data encoding size is 32 bytes or more" or something simple, no, again a table, for all four ECC levels and all 40 sizes. It drives me round the bend. It could be one line of C rather than typing and double checking and testing hundreds of numbers in to a table.

Anyway, in the end, I have myself a nice little library that codes in 8 bit, Alphanumeric, or Numeric (not Kanji, but I could add that I guess). It codes the input all in one format only - I may, later, make something to work out optimal coding of the string changing coding in the middle as needed, like I did for the IEC16022 barcode library I wrote years ago, but I suspect there is no point.

It is very useful having QR readers on my phone to test it, and the reference coding in the specification was really useful too. I like specs that do a worked example like that.

All in all a fun little project for a Monday afternoon.

This was published at the time on the A&A site but is now a GitHub project (here). And yes, we did put it in the FireBrick for TOTP.


Boiling a frog, and old age

We know the story of boiling a frog - you start with cold water and gradually make it warmer, that way the frog does not notice and jump out. [who would do that?!]

Well, I have noticed that getting old is like that. Several times now I have discovered a change in my life that only strikes me when fixed. Being diagnosed diabetic was scary as only when I was on medication did I realise how much all the symptoms crept up on me over the year before. These are symptoms I knew to look for and had drummed in to me by my mother since I was a child, and still they eluded me. Mostly tired and thirsty. I had got used to taking a glass of water to bed - which stopped being necessary as soon as I was on medication. Now I am on insulin, and my diabetes is well under control, or so I thought. Indeed, the annual reviews and HbA1c tests are all good.

The latest example is one where, over time, I have realised that whilst generally feeling reasonably heathy, I was going to bed tired sooner, and feeling much more apathetic and doing less work. If I was up at 9pm there was a joke in my family that it was past my bed time. That should have been a clue. I would sleep for like 9 hours a night, and not do a lot of work during the day.

Then I was put on indapamide as my blood pressure was getting higher, as I blogged recently. The 2.5mg dose was too high and I felt like crap, but now on 1.25mg, I feel better. I realise that since I started on the indapamide I am feeling "better". Over the last few weeks I have designed, coded, and deployed the whole 2FA systems for A&A (four separate systems), whilst also coding a load of other stuff including a Monzo API library and a few other things - documenting it all, and testing it all.

To my surprise, I look back at last week, and realise a couple of days ago I was up at 5:30 am and working solidly until 11:30 at night with no problem, only to be up at 6:30 the next day. I am finding I am bored just watching TV or going to bed, and instead am doing stuff. All last night I was designing in my head new code for a feature on the FireBrick which I ended up getting up and documenting first thing this morning. I feel like I have my mojo back.

The issue is not, as I see it, the blood pressure, which is what the indapamide is for, but it has changed the way my diabetes is working - I am having to take more insulin, about twice what I was, but I am much more stable now. It will be interesting to see my HbA1c in a few weeks time. Indapamide is not listed as a treatment in any way for diabetes, just that it can impact blood glucose levels. What is encouraging is that, having mentioned on a recent blog, I am not alone. Others with diabetes found they were "revived" (which I feel is a really good description) once on indapamide. So maybe it should be a diabetes related treatment?

Now I wonder what the next thing will be - something that will creep up on me over many months before I realise.


Change Freeze

Tricky subject, and the very fact the subject has come up means something for the size of A&A now.

We have a change freeze, started this afternoon, and going on until after new year bank holiday.

The principle is fine - we have a lot of staff off, especially some of the senior technical staff, and none of us want major issues whilst we are at home with family if it can be avoided.

So the idea is we don't make major changes or deploy new systems over the change freeze. Nice idea.

There are, however, a few problems, and it is a change for the way I work for a start.

I am very keen to do a job and finish and deploy it - I hate having any job interrupted by a big gap - I lose track of what I am doing and spend a lot of time catching up and things can be missed. So this means that where there is a job in progress before xmas, I have been rushing to make sure it is all deployed before the change freeze. This is not to say taking short cuts as such, but rushing. I don't want a half finished job not deployed. And no, finishing on a dev system and deploying next year is not good - I like to deploy things as I go and pick up issues and fix them whilst still fresh in my mind. If I did the work and did not deploy for two weeks that would be horrid.

We also have the fact that xmas can be a quiet period from a technical point of view - it is (was) an ideal time to deploy and test changes with lower than usual impact. For a start, a whole bunch of customers are not even there - businesses shutting down. And whilst I don't mean to say business customers are more important than residential, there is a difference for a business customer disrupted in their business for a few minutes, or a home customer disrupted whilst eating mince pies. So traditionally the xmas break has been a good time to work on some major projects and iron out the bugs before everyone is back to work or doing anything serious with what we sell.

Ironically, whilst a few months ago, I would almost be happy to sit around doing nothing all xmas break, we now finally have me on medication for my blood pressure which has had some sort of impact on my diabetes which means for last few weeks I feel much more like I was in my 20's, bored of watching TV, and coding from dawn to dusk (well much later with it being winter). Seriously, this is great, even if it won't last (5mg perindopril + 1.25mg indapamide, FTW).

We had only one snag with stuff rushed through yesterday, and it was not actually due to rushing at all, it was a VoIP issue, which is a complicated set of issues where a recent change, which had been tested on several boxes, was deployed as part of an urgent update to address a customer issue. Sadly, when load got to a certain magic level on the live VoIP servers we go drop outs. Our normal testing on several other boxes did not pick it up, and would not have had we not had the change freeze and hence done the update next week instead. Sorry for the inconvenience on that - the VoIP servers are a pain as reloading means dropping calls but waiting means people with dropouts in calls until we do - we managed to move calls and so only drop a few to get the new code deployed during the afternoon.

But overall I feel rushed by the change freeze and not entirely convinced it will help with issues cropping up or not. I guess we'll see over next couple of weeks, if I go crazy, and/or make a huge list of changes all done on Jan 3rd and consequences of that.

If I do have my mojo back, I am damn well going to do stuff, but maybe not A&A stuff. My son has a load of web/app sites that could be tidied up, and my mate Mike has loads of stuff he wants re-inventing from scratch (probably including "the wheel", knowing him). I may find stuff to do.

So, happy freeze everyone.

How not to do 2FA?

We have purchasing cards with Barclays and the statements come in on a web portal.

The site has a secondary question, which I think they pick from various questions that were asked when originally set up.

So, I logged in, and was asked the extra security question, and got it wrong. The problem is that it was "What is your mother's middle name?". This is a horrid question! (a) not everyone has a middle name, and (b) she has two of them. So I forgot what I had put originally and got it wrong.

So, locked out. Great.

I have spent literally a month trying to get it sorted, with email replies taking a week, and eventually a lot of phoning and getting our business relationship manager to chase, finally, the login was reactivated. Not a good user experience at all.

Same question, same mistake, locked out again, arrrrg!

OK, one more time with the shouting and chasing, and what do I get.


Yes, an unsigned, unencrypted, plain text email with a plain text password quoted that is valid for 2 months! (Yes, I have changed it).

Anyway, this time I guessed the right answer to the question.

To be fair, a password reset process is tricky, we send a link valid for a few hours, but that too is as good as plain text in a way as someone could use it. Just seems so very wrong sending a plain text password by email somehow. I am glad we are setting up the proper 2FA stuff on our systems.

Even so, this looked so much like some sort of spam I nearly deleted it.

OFCOM Broadband USO

Arrrrg! This is a knee jerk reaction - I'll do a proper response to them shortly.

OFCOM published guidance to Government on technical specification for the Universal Service Obligation for broadband. Here (PDF).

In it they have three scenarios:-
  1. a standard broadband service, characterised only by a 10Mbit/s download speed; 
  2. a more highly specified standard broadband service, adding upload speed (1Mbit/s), latency (medium response time), maximum sharing between customers (a ‘contention ratio’ of 50:1), and a defined data cap based on current usage profiles (100GB per month); and 
  3. a superfast broadband service, with download speeds of 30Mbit/s, upload of 6Mbit/s, fast response times, a ‘committed information rate’ of 10Mbit/s (i.e. guaranteed 10Mbit/s at all times) and an unlimited usage cap.
Now, I am pleased there is mention of latency, but no mention of the latency being when idle (i.e. not when a queue on the customers own router adds latency), and no mention of packet loss when idle. They also don't define where latency is measured.

But again, they talk of "speed" and "contention" and even "committed" rates.


Speed (of what?, and to where?)

So, let's start with speed. In the industry and since the days of modems, this has always been the modem link speed (or radio link, etc). It is the raw speed of the link (and usually before overheads like ATM, but that is just a matter of percentage adjustments). It is the speed from the modem at the customer premises to or from the modem at the other end of a wire or radio link.

This speed is important as it was usually the key factor in the user experience - but times have changed. These days users may find that they have an 80Mb/s VDSL sync, but cannot download some film or game at more than a few Mb/s. This could be down to congestion from cabinet to exchange, from exchange to BRAS, from BRAS to ISP, from ISP to internet, within the internet backhaul, or at the serving end. Only some of these factors are within the ISPs control, and some are within the back-haul carriers control. Often you find you cannot fill your 80Mb/s because of other factors. So are OFCOM actually only talking of sync speed still?

They do mention speed as between ISP access network and premises. But this is not a lot of help, as one could have the "ISP access network" in the exchange and/or contention within the ISP. If they are talking of Openreach here, it would be to the exchange only. They ignore congestion in the Internet back-haul or at the serving end of communications. They also talk of it varying depending on contention in the network when actually the issue is congestion. They also talk of it varying depending on home wiring - but that is not between the ISP and consumer premises - but within the premises and not ISPs job!

Committed speed?

This is special. The back-haul providers may offer committed speeds in parts of their network, and will charge a lot for it. In practice it is simply not needed as both BT and TT back-haul are generally not congested. But you, once again, have to ask where this 10Mb/s is committed too?

On the modem to modem link, if the sync is over 10Mb/s then you have a committed speed of that sync all the way to the other end of the wire where the other modem is, simple. Does that meet the requirements, or does this commitment have to go further?

What of to the ISP? Well, as an ISP we have lots of people on what we would call "super fast" links. But we do not buy 10Mb/s times the number of such lines. We buy capacity to avoid being the bottle neck (and we are better than many ISPs in that respect). Where the average download of customers at peak time is around 0.5Mb/s then obviously the capacity we buy is of the order of 0.5Mb/s * number of lines and a bit of headroom. Buying 20 times that will not make anybody's downloads any faster, it just means we charge 20 times as much to cover extra costs (well, not quite, but a lot of the price is the back-haul bandwidth). It makes no financial sense for an ISP to dedicate 10Mb/s on a service where the usage is way less, even if a back-haul provider would offer that as a service. And the back-haul providers have the same model - to get a dedicated 10Mb/s would be stupidly expensive. I may price it up on BT in my reply to OFCOM. But it is silly - why have a service offering that needless costs 20 times what is needed to offer no extra performance and just the nice feeling of links at only 5% utilisation?!

But what then - this dedicated 10Mb/s gets to the ISP. Maybe the ISP even buys transit and peering links to accommodate that, what then. That adds again to the expense - meaning even a small ISP like us would need many 10Gb/s external links - around 20 times what we need for customer traffic.

Well, then there is the Internet. Let's say there is a web site on the Internet. Let's say that server has a 1Gb/s link to the Internet. Let's say we look at UK only, and there are what, 4 million people in UK that can get super fast broadband? If they all have 10Mb/s dedicated, that one web site needs a 40Tb/s link to the Internet to ensure all of the UK super fast users have a dedicated 10Mb/s to that one server...

But that is crazy, clearly. This is not a USO on web sites is it? Well if not, then what is the metric for? Where does that 10Mb/s have to go. End users are buying internet access, so anything short of 10Mb/s dedicated to their favourite website is not actually 10Mb/s dedicated, is it? Is 10Mb/s dedicated to their street cabinet acceptable? This is going to cause customer confusion.


Contention is no longer a sensible measure, and even when it was, it only make sense if you specific the two points where the contention is measured.

Specifying 50:1 makes no sense. For a start, it is massively different if you have 50:1 or 5000:100. If you have 10Mb/s and there are 49 others sharing a 10Mb/s link, then two people downloading at once get 5Mb/s each and see "slow" Internet. If there are 5000 people sharing a Gb/s link, then you need 100 other people downloading at 10Mb/s to slow your link 1%, so access seems much faster.

In practice, what matters is a capacity for each end user that reflects normal end user access rates at peak times, e.g. 500kb/s (as it seems now). That means you normally have uncontested links. That measure needs to change over time. You need links at least a number of times the size of the end user min size, so if 10Mb/s is min, then links that are 1Gb/s. Specifying contention is not the way to do it.

But also, where is that contention measured. Once again, modem to modem is 1:1 and not shared. Cabinet to exchange will have some contention, but if not congested the contention does not matter. Contention in backhaul and back to ISP is another factor.

Also, with line speeds going up so much - contention is more problematic. If a line is 80Mb/s at 50:1 then that means an average capacity per user of 1.6Mb/s per user, when in fact all you need us around 0.5Mb/s. But if users are 10Mb/s you only need 0.2Mb/s per user, which is poor these days in the netflix generation.

We still come back to this being an Internet service and so that web server with 1G/s link, and accessible by a billion people around the world, what is that contention ratio?

They also talk of contention "at a node" but do not say at what node, or why they have picked that node rather than any of the other points elsewhere between some Facebook server and the end users laptop. Again, user confusion.


It does look like they are talking just access network broadband and maybe back-haul here, but I do not think the document makes that at all clear. I assume they hope if good access networks exists then Internet access will simply follow and not be an issue. That may be true in some aspects, but ISP models depend on the pricing of the back-haul, and that is expensive, so some ISP models involve congestion at the ISP. That model is not helped by this USO at all.

I am not sure they are addressing user confusion on this at all - even someone paying (at lot) for 10Mb/s committed rate cannot expect to download 10Mb/s from their favourite web site, but they probably will expect that from the ISP somehow if that is what they are paying for.

Bear in mind, one of the providers here is Openreach, and they only operate from master socket to the exchange. So do they meet a USO if the speed to/from the exchange meets this spec? What of BTW/TT back-haul? To be fair, that is usually fine, but won't meet defined contention or committed speeds without stupidly big links from them to the ISP being mandated somehow.

It needs work!


What we did in the end for A&A 2FA

The system is OATH/TOTP 6 digit 30 second authenticator codes, set up by QR code. We have TRNGs we use for seeds that are 320 bits long.

On the accounts system we have gone for some flexibility. Option to SMS codes instead, but configurable, and configurable trust level to decide when to ask for a code. It is also a seed we hold so staff can ask for a code to check you are who you say you are (a useful feature on phone, irc, web chart, etc).

On the control pages (and the internal staff A&A systems) we have gone for encrypted TOTP seed and no SMS option. The seed is binary data, XOR'd with a stretched Argon2 hash of the password and a seed set for that purpose (i.e. the seed also has a random seed for its encryption), so no way to check you have right answer other than doing the Argon2 hash and checking an authenticator code, so not a shortcut to crack the password hash.

This means that on control pages the password change needs old password if you have 2FA set up, and expects an authenticator code as well. Some staff can override, but they will also look at account settings as part of deciding you are you!

I think, overall, we are doing well. Hashed passwords and 2FA with encrypted 2FA seeds.

There is always more to do, and more security to add, but this is an ongoing process.

Customers can now set up 2FA on A&A accounts and control pages if they wish - have fun.


Copyright and links

One of the rather annoying and weird bits of copyright legislation is that it does not just protect against copying but also making available. The second part is the tricky one, and a CJEU ruling recently muddied the waters only to be made worse by a case in Germany!

More on this from arstechnica.

The upshot is that if a commercial web site simply links to a web site/page that contains infringing material then the linking site becomes infringing as it is making available.

What is worse is that the link itself (according the Germans) does not have to be for some profitable purpose, simply a link on an otherwise commercial site. Also, the web site operator that created the link does not even have to know the linked-to site is infringing. Indeed, they could have done all due diligence when making the link and the content of the target site has since changed, or worse, not changed but the infringement status of what is there has (a time limited licence on use of an image, maybe). The German case was a link to something covered by a creative commons licence but it turns out the linked to site did not meet all of the requirements making it an infringing site. How the hell does someone making a link check that shit, and how can web sites continue to exist if every external link on every web page needs a small team of lawyers to check it, and recheck regularly?

This is, of course, crazy, and make no sense to anyone looking at it in any technical way. It makes little sense to those looking at it from a legal viewpoint either, as far as I know.

One of the problems is that the linking site being infringing means that anyone linking to the linking site is also infringing, and so on until the whole world wide web is infringing! If it was not like that, then you do not tackle the original problem - people actually linking to where one can download dodgy copies of stuff can simply link instead to a site that then links to where you can download dodgy copies of stuff, ideally with that intermediate site being outside of EU (and this crazy legislation).

Of course, you could make a non profit site that links on to other sites, like a URL shorter, but could be simpler and just strip its domain off end of link so very noddy. Then make all external links on your site go to that non profit site that links on to the target page. That way all your links are now laundered and not infringing?

I am not sure how either of these play points out legally.


The TOTP seed storage dilemma

When making any sort of login system you typically used a username and password. One of the key things you should do, if at all possible, is hash the password.

This means that we do not know the password people have used! We can check a password attempt against the hash (and then forget the password), but we never store the actual password.

If the user database was to be compromised in any way the attackers would not get real passwords, and so could not use them. More importantly, given so many people re-use passwords, it does not give then the passwords people are using on other systems.

So far, so good.

In additional to passwords we are now using two factor authentication using an authenticator app that provides a new 6 digit code every 30 seconds (Timed One Time Passwords). This uses a seed that is a long random number which is stored in the app and which we have to know as well - that way we can generate the same code and check it matches. We actually generate the codes for the last few minutes and check it matches any of them.

This creates two factors - one is something you know, being the username and password, and the other is something you have being the smartphone app or authenticator device. There are, of course, issues of ensuring the two have the same seed, but using a QR code on an https page seems a good compromise.

The problem is that we have to be able to see the seed to check the code, so it is not hidden. If that seed gets out then the authenticator is compromised as someone else can always generate the same codes. So in an ideal world we do not want to be storing this seed in the clear.

The small revelation I had this morning was that we could simply encrypt the seed with the plain text password. This means that when you provide the passwords and authenticator code, we can check the password, and knowing it is right we can use it to decrypt the seed and check the code, all in one go.

This is great, and I nearly rushed off an implemented it before realising a significant number of shortcomings with this.

One big problem is lost passwords. When changing password you always have to know old and new passwords so as to decrypt the seed with the old one and re-encrypt with the new one. This is fine for a change password form, but not for any sort of lost password reset process.

Indeed, at present, we use the authenticator code to validate the reset password request that is sent by email (so working email as well as using the authenticator code as two factors). However, if the authenticator code cannot be validated without the password, you cannot do that. You either have to trust just the email working, which is not ideal, or you have to find some other validation process as well. Also, when resetting a password you have no choice but to also issue a new authenticator seed and reset up the app with that new seed.

You also cannot use the code as a validator when talking to staff as they could only check the code unless they also have the password, and we would never want to ask a customer for their password.

So this creates a trade off - transparent storage of the seed on our systems and added convenience and some extra security on password reset, or encrypted seed on our system and some much more constrained processes and less convenience.

I can see we may end up with the underlying libraries allowing both options and using for different systems as appropriate.

Isn't security fun some times :-)

P.S. Read the comments - at least one important point I had not realised.


Leaning to code

I have to say I am slightly impressed and pleased with my son's abilities in coding today.

He has been working on tools to have transaction details from Monzo, specifically to handle money loaned and repaid. I have written a load of C code behind the scenes for him to get the transaction data and run a script of his.

The fact Monzo have an API is great for this, and the real-time nature of the web hooks is even better. They lack an "update" web hook, but all in all it is pretty impressive.

Now, what struck me as insightful is a couple of questions James asked, and importantly he asked before he hit the problem caused by them!!!

1. He realised that if he happens to be set up to track Monzo accounts for both sides (him and his girlfriend), he will see the same transaction twice, and hence record the transaction twice, once from each side. But there are scenarios where he is not tracking both sides and so only sees it once. He needs to de-dup the transactions. I am impressed he realised this, and he has managed to find a reference in the transaction which is the same on both sides (a p2p transaction id). So he can de-dup that. He even worked out that matching by amount and parties and time is a really bad idea to de-dup this stuff.

2. He also realised that he might get the two sides of a transaction concurrently and so code that checks for it existing, and if it does not, goes on to add the transaction, could happen concurrently and add both. This can be solved in many ways from locking around the transaction handling script, to table locks in SQL, to unique keys on tables. But the fact he worked out it was a risk is excellent. So many people do not understand "real time" coding and race conditions, and he spotted this as an issue, and once again did so before it happened.

So, well done, my mini-me is growing up.


2FA on A&A control pages

I think we have 2FA sorted nicely on our accounts pages for A&A.

We have taken on board some constructive comments, and done things like "you can't set paranoid mode until you have confirmed that the app has been installed and used a code", and also not showing the QR code or seed again, once installed.

The trick to both of these was to use a different seed for codes sent by SMS than codes from the authenticator - that way we can tell if the authenticator has in fact been used and not just an SMS'd code. Obviously we did have to allow for the remote chance of code matching both seeds and ask for a new code in such cases.

I event made a video showing how to set it up!

The next step is 2FA on our control pages. But first it is worth explaining - yes, I know that really what you should do is identify a specific living individual in some way, and separately have associations of accounts or logins that they are allowed to access in various ways. The system we have does not do that, I know - we have the accounts logins and the control pages logins. The system we have also has a complex set of linkages such as dealer logins on control pages, and, of course, staff logins. Changing to a "single sign on" is a good idea, but a big step we'll tackle another day.

The plan is to use the same library and tools, and almost identical set up processes, as the accounts 2FA system. Indeed, some of the pages/scripts will simply be copied and amended for the different database structures in use on the control pages.

There is, however, one extra trick we can do, and it will be an extra button. As we have the "seed" for the 2FA on the accounts, we can simply have an option to "Copy 2FA from account" for an associated control pages login - why not? Obviously we will ask for password and a new code at the time to confirm you have the authenticator, and there is no reason to show a QR code when doing this. But this would reduce the number of authenticator entries you need installed and will be ideal for cases where it is one person handling accounts and technical issues - like most of our home customers.

This would just be an option though - customers can have separate control for accounts and technical, and have separate people and hence separate authenticators for the control pages if they want.

I hope that sounds sensible. However, I do plan to "let the dust settle" a bit, and see how the accounts 2FA works out before working on the 2FA on the control pages. Feedback welcome, as always.


Security is a battle on two fronts

As an ISP, A&A are obviously quite concerned with security. Many ISPs have had leaks, and we hope never to be among them.

But "security" is far from being an absolute. It is a battleground, and whilst some part of that is the general battle for privacy and the stupidity of UK law, the main battles we face fall in to two areas...

1. The bad guys
2. The users

New hashing algorithm

The battle with the bad guys is hard and never ending. Every step you take is a mitigating factor. We have a dedicated ops team and part of their remit is security so they are constantly finding things we should improve to be best practice, or more often beyond best practice, if we can.

Some time ago we instigated a password hashing improvement programme. At the time, the password hash competition was not complete so we went for a heavily salted SHA256 hash. However, the main change at the time was not choice of hash but choice of a system of automatic upgrade. All of the systems we use, where possible, not only use the latest preferred hash but update the hash we have when someone next successfully logs in.

This meant that at the time our accounts and control pages and several internal systems moved to that SHA256 hash very quickly. However, we know that SHA256 was not the best approach as it is a cryptographic hash and not a password hash. There are different types of hash that have different objectives, and a password hash is designed to be time and memory consuming where as a cryptographic hash is designed to be quick but impossible (for some values of impossible) to reverse.

So, having put this all in place some time ago, we recently moved the the competition winner, Argon2. This is a specifically designed password hash, so any successful login will move your hash to this.

Why is this important? Well, it relates to the risk that our database is ever compromised. Obviously we work hard to avoid that, but if it happens then the hashes will not easily be crackable to find passwords. That is the plan. Not all systems allow passwords to be held as a hash, but our various web site logons do.

But tackling the bad guys is not all technology - it is also the social engineering. The call we get at 10 past 5 from someone that is as nice as pie and visiting his parents house for his fathers funeral and does not have the details and just needs this minor change done over the phone. The bad guys are good at this shit, and we have to be vigilant but somehow also allow for the customer that in genuinely in that situation!

But that leads me on to the other battle - that with the users (aka customers).

Bear in mind some ISPs store such passwords in plain text and show to support staff!

Battling customers?

OK that sounds unfair, we should never be battling our customers, but the real battle is human behaviour. People will re-use passwords or use dumb passwords. This is why our password setting system is hard to not accept the pre-set random one we offer. We don't quite make it impossible to pick your own password, because we know of people that do run password apps that provide individual and very secure passwords to use on our system. Sadly, ultimately, we cannot tell such people from those that want to re-use a password or set to "PA55W0RD".

But even then the battle is more subtle - people will store passwords in their browser and then get hacked and passwords collected. People are inherently lazy, and we all know it. We are all the same.

So, our latest initiative is allowing two factor authentication on our accounts system web login. This is an extra step that is not possible (in theory) to store in the browser - a code that changes every 30 seconds from an app or device you have, usually a mobile phone app.

But as part of that we need to also allow people to have trusted browsers that stay logged in, or trusted browsers that do not ask for the code (usually). If we ask for the code every time people will turn off the feature. Remember, people are lazy. Security is always a compromise with convenience.

So we ended up allowing a paranoia setting. Customers can, if they wish, set so the code is needed every time, and even staff will not talk about the account unless you can quote the code over the phone (or irc, or web chat or whatever). But people can set less severe modes where the code is needed on login only if not your usual machine or a recent bad password entry.

We have decided that if you have set up 2FA then we do insist on the code on all orders, even if over the phone. But ordering is rare enough that people can cope with that, we think. The whole 2FA remains optional.

We think we have the right balance of convenience and security now on the accounts web site. Next step is our to our "control" pages. But behind the scenes we are working on more more systems to improve security all of the time.


Investigatory Powers Act - devil in the detail

It is published (here). It is an interesting read, so here are some initial observations...

I have been trying to focus on the bits that could impact us (A&A and FireBrick) mainly, and I am very happy to have had help from a friendly lawyer on this matter. I am the first to accept that I am not an expert on reading legislation, but getting better as the years go on.

So, some observations, in no particular order...

Can a retention order be placed on BT Wholesale to monitor A&A traffic?

We think no - surprisingly. This is because of 87(4): "A retention notice must not require an operator who controls or provides a telecommunication system (“the system operator”) to retain data which relates to the use of a telecommunications service provided by another telecommunications operator in relation to that system".

So that should mean, we think, that BT Wholesale or Openreach or BT plc as "the system operator" cannot be ordered to retain data which relates to the use of the telecommunications service provided by A&A in relation to that system. We see that as meaning BT provide PPP and we provide IP, and so BT cannot be ordered to log IP (or above), only PPP which is basically their RADIUS logs, because IP is related to what we provide via that system.

Good and bad - good is it means, in theory, if we say we have no monitoring (we don't) and we can assume BT do not, then there is no monitoring (same logic to LINX and transit providers). Bad news is that they may be more inclined to ask us to do retention as a niche ISP.

But it gets more fun - given that this now covers private as well as public telecommunications services, it is easy to say that every single one of our customers is a telecommunications operator even if only running one router to provide service to one person. So we can argue that we cannot be expected to retain data relating to our customer's use of the IP - you have to ask each and every one of them to retain data and not us.

We'll see how that plays out if ever we are asked to do retention (which we, A&A, have not been).

Can FireBrick be forced to add a back door?

We think no, thankfully. The definition of a telecommunications operator, which we thought could cover FireBrick would require that FireBrick is providing a "service", which we are not, we are providing a product, and that the FireBrick itself is a "system", which it is not, it is apparatus.

Even so, we still have standing order that if asked to back-door FireBricks then the UK company FireBrick Ltd would be dissolved.

In short, you can trust FireBrick!

Is FaceBook a telecommunications operator?

Well, this is tricky. Home office think so, apparently. An operator offers "services", and services means a service consisting of access to or facilitating making use of, a "system". A system is something allowing transmission of communications by electrical or electromagnetic energy.

So a system is wires and fibres and radio; A services provides access to that or making use of that; An operator offers a service to do that.

I think the wires, and fibres, and radio, facilitate the use of FaceBook, not the other way around. The "make use of" may be the sticking point.

I think it is badly drafted! FaceBook may want to argue on that definition.

What are Internet Connection Records?

Something much hyped in the process of this becoming law, but relegated to a small part of the Act.

It is a narrow and specific definition, "In this Act “internet connection record” means communications data which may be used to identify, or assist in identifying, a telecommunications service to which a communication is transmitted by means of a telecommunication system for the purpose of obtaining access to, or running, a computer file or computer program, and comprises data generated or processed by a telecommunications operator in the process of supplying the telecommunications service to the sender of the communication (whether or not a person)."

So it is just stuff to identify the service used by the sender, nothing more. But why does this narrow definition matter?

Well, retention can cover all sorts of data, anything that is not "content", which is "meaning of the communication". And that can be way more than ICRs. It is clear that ICRs are a subset of that data.

However, requests for this data to be acquired (e.g. from retained data) can cover anything.

There are restrictions on "local authorities" getting ICRs, but as that is a subset of the data ISPs may be forced to collect. So that is a less than useful constraint. Local authorities could ask for all sorts of non ICR data an ISP was required to "retain"!

How serious is "serious crime"?

Some aspects of the acquisition of data have restrictions for "serious crime", and that covers stuff with long prison sentences. Good. But, oddly the section also covers "relevant crime" which is rather fun as it covers offences that are "by a person who is not an individual, or which involves, as an integral part of it, the sending of a communication or a breach of a person’s privacy." This means things like failing to put your company number on your letterhead (a crime by a company) is lumped in with "serious crime"!

And the irony that you can get all this data which is a huge invasion of privacy to investigate a breach of a person's privacy is not lost on me.

Can the food standards agency get browsing history?

Well there are caveats, but yes, they are in the list and not even covered by the "local authority" exception to getting ICRs.

Does this mean back-doors can be mandated?

Well, yes, to any "service" which can be ordered to maintain a capability to decrypt stuff and even notify if new services are planned to ensure they have the back-door.

But not if you do the encryption yourself, using PGP or your own apps or pen and paper! Criminals can do this and do so legally with no interference by this Act. Well done!


Two factor authentication

I am working on some new two factor authentication for our systems.

Before I even started this, I actually updated the systems we have in place for managing password hashes to move to the password competition winner Argon2. It updates the hash on next login to our various systems.

However, a big step forward would be two factor authentication where in addition to a username and a password we ask for an extra bit of information.

From various research the way to do this is using TOTP (which is a timed OTP using OATH hashing system). Basically you have an app or device that provides a code every so often, and when you log in you have to enter the current code. We are using the default of 6 digit codes created ever 30 seconds.

There are quite a few issues with this, and a scarily large number of OTP and OATH and TOTP applications available. It is a well published standard.

The challenge is getting the "seed" or "key" in to the device, or if it is a hardware device, then from the device to us. The latter is something to tackle later as most people use a mobile app these days, so we make the seed/key and it has to get in to the app.

The answer is a QR coded URI and there is a "standard" for this. It encodes the settings in a standard format with the seed/key in BASE32 which can be read by various apps including Google Authenticator. Once read, it provides a code every 30 seconds.

At our end we need to store the seed, which ultimately has to be readable. But we have hash for password and a readable OTP seed, so two factors, which is a good start. There is no real way around storing the seed in a readable format, sadly.

But it does get quite complex, and this is what I am working through now.

1. How do you make sure setting up or resetting the TOTP is safe / authenticated. Current plan is texting a code as an alternative two factor authentication before we can disclose the seed/key as a QR code. Some actions need to be properly two factor authenticated.

2. Do we allow changes of password if not already TFA, and what of lost password - is that independent?

3. What access do staff have to reset or clear the TFA system? How do we defend against social engineering whilst not locking out genuine customers? How much staff training on social engineering can we do? Ho much staff time will this take?

4. What levels of control do we offer to customers, what degrees of paranoia do we support?

5. What of ancillary systems such as ordering or the CHAOS2 API? Current plan is ordering will require the TOTP code if that is set up at all, even if normal logins not requiring as "trusted browser".

It is never as simple as it sounds when looking purely at the technical side. Systems like this extend in to social engineering!

Anyway, we are starting with staff logins, and then moving to end user logins on our various systems offering, and even recommending, two factor authentication.

P.S. Yes I waited more than 5 minutes after taking that picture so that even if you know my username and password you cannot use the code. And yes, we also protect against replay attacks on the code.

P.P.S. After some feedback we now only show the QR code until the installation has been confirmed, then it is no longer shown. That means the SMS codes and the authenticator codes use different seeds so we can tell which was used (and check for a clash just in case). Thanks for the feedback, this helps security.

NOTSCO (Not TOTSCO) One Touch Switching test platform (now launched)

I posted about how inept TOTSCO seem to be, and the call today with them was no improvement. It seems they have test stages... A "simul...