2018-04-27

Security is fun

The new Let's Encrypt / ACME stuff on the FireBrick is going well! We have alpha code releases and for a change we have people doing a lot of testing and feedback. Please do ask your supplier if you want your FireBrick enabled for alpha releases.

I say "for a change", but we have people that help with testing these early releases all the time.  However, the ACME code has really been very popular and people want to test it. The whole project has been fun for us too.

The good news is that for most testers the ACME system has "just worked" - they set up DNS for a hostname for the public IP of their FireBrick, put in the config, and within seconds they have a certificate for HTTPS and IPsec. Indeed, so simple and so fast, people are already asking if any way for FireBrick to do the ACME as a front for other servers?!? I thought apache made this easy, but apparently FireBrick is easier, according to customers. That is, perhaps, a good sign!

What is also fun is the number of ways people find to break it!!!

This is one of those cases where we will be making alphas with minor tweaks for a week or two I am sure. Some of the challenges have been surprising, although all, so far, have been minor.

We have people running networks that only work over IPv6, which is good. But then we have people doing that only over 6over4 tunnels, which causes some challenges with source IPv6 addresses.

We have had a few cases with firewall issues. The logic on the FireBrick is that the brick itself should not need a firewall. The individual services have, at the initial packet level, controls on IP access ("allow" lists, and "local-only" settings). Indeed, some versions of FB600 have no "firewall" as such, and only use these access controls. This is important for the ACME logic as the brick needs to be accessible via HTTP on port 80 from Let's Encrypt (or chosen CA) to authenticate the domain. The code allows port 80 access even when otherwise locked down so as to establish TCP and send an HTTP request for the specific ACME authentication URL. This means port 80 open for a few seconds every couple of months but only for that specific URL and no other access (unless allowed anyway). That does not work if firewall rules (on the FireBrick, or externally) stop that access.

We are also looking at what controls we need. For example, the FireBrick obtains new key pairs as needed (and we need at least two for ACME) from a FireBrick server (see P.S. below). But best practice is to create your own private keys on an isolated laptop and install via a direct connection to the FireBrick. That is hard work for most so we have a convenient and secure way to get from us over TLS (trusting us), but for good security reasons we need config settings to prevent that if the end users wants it locked down. We have that (acme-keygen) for those that need it.

We also load some root CA certificates so we can access Let's Encrypt servers and check them, but again we may need some controls so people can decide what CA certificates they have installed. So there are likely to be some changes to control that soon.

This is a key example of the fact that security is a compromise with convenience, and sometimes the convenience is important as without it people won't bother with the security. So a lot of what we are doing is a fine line - make it so people can, and will, use https with proper certificates, but that means some levels of trusting us as well. And we have to allow for the fine-tuning that some people may want to decide which side of that fine line they wish to work.

So it is a fun bit of code, not just at the "nuts and bolts" level as per my last blog post but at the high level of considering attack vectors and risk and trust.

I am sure more issues will come to light over there next week or two - so do try the alpha, and let us know any issues you have.

P.S. After a lot more research, we feel confident enough to have the key generation done locally in the FireBrick, which is obviously better. It means first time of ACME will take longer, obviously, as we need to collect entropy and generate keys. Obviously you can still load your own keys instead.

2018-04-19

Learning to ACME

[Technical]

I thought it may be fun to explain what I have been doing over the last week in more technical detail... I have been coding ACME (which is the protocol to get a certificate issued for a domain) for FireBricks. It is aimed at making HTTPS set up really easy. Right now you have to install keys and a certificate manually, but ACME will make it simple and seamless.

JSON

The obvious first step is that the protocol talks to the ACME server using JSON. We send JSON objects and receive JSON objects back. It is all done over https to the ACME server.

I commented on JSON recently, and even with years of experience using XML, and in some cases converting XML to/from JSON to work with Javascript, I am thinking JSON is not bad, and seems quite well suited to this job.

Not the first time I have handled JSON, but I needed a JSON library for the FireBrick, which only took a few hours.

JWS

However, the JSON that we send to the ACME server is not just a simple JSON object, oh no. It is a JSON Web Signature protocol. This means you make some JSON, you then BASE64 code the JSON, and make that a "payload" field in an new JSON object. You also make some more JSON which is various fields defining the public key you are using. These fields (e.g. "e" and "n" for RSA) are BASE64 encoded. That chunk of JSON is then BASE64 encoded, and included as a "protected" field in JSON. Then a signature is made, BASE64 encoded and added as a "signature" field. So what you post is a JSON object with three BASE64 encoded fields, two of which are BASE64 of other JSON objects. Yes, I know, complicated, but I got all that working. Thankfully the reply is just a JSON object as normal.

BASE64 but not quite normal BASE64

Another fun detail is all of the BASE64 used is not normal BASE64, which is A-Za-z0-9+/ but a URL safe BASE64 which is A-Za-z0-9-_ instead. So even simple debugging using base64 command line tools on linux often failed. Also, normal BASE64 pads with = at the end, but this is all unpadded. I never really understood why padding is used anyway, so quite on board with that one. Fortunately BASE64 is a doddle.

Not just JSON

A somewhat frustrating part of the API is that it is not just about sending and receiving JSON objects. If only!

No, some of the key data you need is in the HTTP headers. Some of this is as per HTTP spec, but they did not have to do it that way - they could simply have sent all of the data you need within the JSON objects, or even duplicated in to JSON objects if they wanted. So you cannot use a simple HTTPS client to get a response, like curl, you have to also get selected header values as well. In some cases more than one such header.

So, my client library was updated to allow selected header extraction.

Replay-Nonce

There is also a special field, a nonce, a code issued by the server which you have to send in the next message you post. This is one of those header fields, but only if you POST something, not if you just GET, so you only grab this header on some of the interactions not all (arg!). You then use this in the JSON (not as a header!!) when you post the next item. This is all good in that it stops someone capturing an interaction and replaying it for their own use, but it is annoyingly inconsistent, header one way, JSON the other, for example. It is, however, included in what is signed in the JSON to avoid tampering.

JWK Thumbprint

This is special. It is a hash, BASE64 encoded, of a chunk of JSON which holds the public key (JWK). You send exactly this JSON as part of the ACME messages (in the "protected" part). It is also part of the response the web server has to send when challenged. You have to prove you own the domain by making the web server respond to an HTTP request with a specific value.

What puzzles me why not simple send a nice random string as part of the ACME protocol and expect me to respond with that?

But, no, we have to make this Thumbprint. However, this is where it gets a tad special. First off, the JSON has to be exactly right, with the exact fields you need in exactly the right order and no whitespace. If not, then the signature does match and all you know is it does not match!

Now, this is not a question of using the same JWK you sent in the ACME messages, no. They can be fields in any order, for example, and work. No, it has to be exactly right. However, the ACME accepts  it in that format so I can use one function to make it.

But it gets worse. The public key includes the "mod" value, which is a long string of bytes BASE64 encoded. A small note mentions that any leading zero bytes must be stripped. This is not needed for the ACME messages in JWS to work, but if you don't do it, you get a different JWK Thumbprint and so nothing works. It is not even quite what you might do in ASN.1 as the next byte has to not have the top bit set else you indicate the field is negative. This case is simply strip leading zero bytes. That took me hours of testing, comparing to examples, and re-reading the spec.

I am still quite surprised it is not simply some random string provided by the ACME message for the challenge.

Certificate Signing Request and ASN.1

Having got through the challenges and got as far as an authorised order I can send a final request with a CSR and get a certificate. yay!

But I have to make a CSR. So far the FireBrick code has has to decode ASN.1 for certificates and so on, but not generate much ASN.1 (SNMP is somewhat simplified in that area).

So, another couple of hours making an ASN.1 construction library, and then working out what goes in to a CSR. Thankfully tools like openssl will parse what I make at an ASN.1 and CSR level to tell me what I have.

ASN.1 is a bit like riding a bike. Every time you work on it, it all sort of comes back to you...

I am also really impressed with the Let's Encrypt staging server in terms of the error messages it returns. They tell me exactly what I have wrong.

It turns out the certificate only needs the common name, which makes sense as LE only sign that as that is all they have proved, so no need for company and locality and all that.

I was quite chuffed that the first attempt to make a signed CSR just worked, I got the signing right. That is rare in coding.

Two key pairs

So, I finally have a valid and signed CSR, and send that, and get an error telling me the key used for the "account" (all the messages to/from the ACME server, and for the JWK Thumbprint) must be different to the key for the domain (i.e. in the CSR).

So now I have to faff with a second set of keys and make sure they are used in the right place.

Finally

Finally we get the certificate and install as normal. Actually, for Let's Encrypt it is two certificate as they have an intermediary one as well.

Testing on a new box, I added a hostname to the config, and 4 seconds later we had working https using that hostname. That is how simple it should be :-)

Next

I have a lot of tidying to do, and we need to make this a bit more polished before a release of FireBrick with this in place.

One idea is handling more than one hostname. I think this will be less common, and originally we thought we would get one certificate with "alt" names on it. However that does leak all of the other names for a brick if you access one. So plan is separately getting a certificate for each, and probably a status page showing progress, and expiry and so on.

To be fair, the host names used with Let's Encrypt are published anyway, which may be an issue for some. But ACME should work with other CAs, though we may have to add extra fields if someone wants to do that.

There are also access control issues over HTTP access during the authentication stage which needs allowing TCP port 80 automatically, even if only for a few seconds, and also being locked down to just the ACME authentication and no other access via that. Not hard, but needs doing with option to turn off.

So, maybe next week we will have alpha releases for people to test.

P.S. Some work over weekend - much more polished, and much better error reporting. Really close to an alpha for customers to test now.

2018-04-13

On line orders

I ordered something recently, on-line, on a whim, a new iPhone case.

I paid in UK pounds (£) there was no immediate clue that the site was anything other than a normal UK supplier selling something to people in the UK.

I ordered on Saturday.

What pisses me off is how this is so much not the case. It seems it was a US company, and I know things can get from US, or almost anywhere in the world, to here, in a day or two, but they picked the slowest means to send to me on the planet from what I can see.


They seem to have used DHL (which people will know how I am unimpressed with them) and some sort of service that is slower than slow - delivered by snails. This is an item ordered and paid for on 7th. It is now 13th, and the damn thing is still in the US, FFS.

What is worse, there is a chance I end up with some damn duty or VAT bill to pay on top.

I just went to a web site - saw an item listed in £ and paid, why the hell is any of this my problem now?

All I can say is it better be a damn good phone case when it gets here!!!

P.S. It did eventually arrive. Quite a nice case for £16
P.P.S. It promptly broke, so I have a spigen one now, much nicer.

2018-04-10

FB2900 and Let's Encrypt

Well, the FB2900 is out!

The retail prices are lower than the old FB2700, £500+VAT for base, and £550+VAT for fully loaded with £35+VAT for rack mount kit. We should have the DC powered models available soon.

We have gone for lower prices to encourage more take up in the SME market. It is a bit of a gamble, but this is a really good product - not just a gateway router handling multiple ISPs, but even a VoIP switch / PABX. Perfect for most small businesses and even some large businesses.

The delay, for a week or so, was down to wanting to ensure https was working - this meant a lot of loading Windows VMs and testing on all sorts of different browsers. It needs manual loading of key pair and cert but it works well. I am really impressed with the work of my colleague, Cliff, on this, as the end result is just as fast to use as http. Very impressed.

It is timely as safari, and I am sure others, are now getting quite pushy on sending any form to a site not using https.


But we have said we expect to release more new code soon. The FireBrick s/w has always been free, and we have ensured the older models FB2500, FB2700 and the FB6000 series, all have the update for https now. But the next code issue should make it a lot cooler.

First off, I am planning some simple self signed stuff so you can use https before setting anything up. This is a bit naff, but every other idea we have come up with has flaws, and it is what everyone else does. The key thing is that it stops passive snooping as a threat, but not not proper security.

You need a proper key pair, and certificate, to do https without warnings. The FB2900 have a key pair loaded individually as part of the production process which means we just need a certificate. The FB2500, FB2700 and FB6000 series will need a key pair loading. This is partly because we are not yet confident we can make a "good" key pair. We are very cautious when it comes to security, and this is an area that has gone wrong for others, so we want to be careful. When we are happy we can, we will, but whilst FB2900 has a hardware true random number generator, the older models do not, so it will not really help for non FB2900s.

But even with a key pair loaded, which is not hard, you need a certificate. This is where we plan to do way better than most embedded systems. We plan to use ACME with Let's Encrypt as standard!

So the idea is simple, tell the FireBrick its public hostname (and if not an FB2900 then load a key pair) and it will make a CSR, apply for a certificate from Let's Encrypt and install it and renew it as needed. Proper working https with no warnings and no faffing about renewing things. That's the plan.

The same certificates and keys can then be used for IPsec, obviously.

It is not that easy as it is aimed more at a traditional machine / server, and not an embedded device, but I believe we should be able to do that within a few weeks and have a new s/w release.

In the mean time, do enjoy the new s/w release for the whole range - which will be a formal release shortly after beta testers are done with it.

P.S. (18th April) All going well, and we expect to issue alpha code any day. Test bricks with just adding public host name working on https 4 seconds later. This is "fun" coding!

2018-04-09

Outward opening front door

One of the decisions I made in my garage conversion was to have an outward opening external door.

This is, as I understand it (at least in the UK), unusual. It was mainly an attempt to maintain as much internal space as possible.

There are issues, the hinges are outside and so subject to attack, which is why I have "hinge bolts" in the door frame. Also, when someone calls, you end up opening the door in to them (rare as I have a window).

But I noticed whilst watching Stargate SG1 Revisions that all of the town had doors that opened outwards like mine - possibly because the rooms are all small.

They are filmed in a place called Fantasy Gardens which is used in other Stargate episodes, and actually, a lot of films!

Fascinating place it seems, albeit torn down now!


Standards (TLS)

XKCD tried to explain a bit about standards...


But there are some other aspects, even when you have good, single, consistent standards the challenge can be implementations.

My fun today revolved around TLS and https.

So, the way it is meant to work, is when we close a connection, we send a TLS level close alert, and the other end sends us one, and then we close the TCP connection underneath. This is pretty simple and works for almost all connections...

Except...

Testing Edge on MS Windows 10. Some of the pages on the FireBrick are dynamic and so work on a Connection: close basis. This means, instead of a Content-Length at the start, the data in the page is sent until the connection is closed.

For http this is simple, we close the TCP at the end, job done.

For https it should be simple, we do a TLS close message, we should get one back and then close TCP, but no... We get no reply to the TLS level close, and TCP stays open. The web browser shows the page not completely loaded, and so the onLoad javascript does not run and all sorts of other nasty side effects, WTF?!

The fix is not too hard, a half close on tx side to send a FIN after the TLS level close, allowing far end to send a TLS close back or just close at TCP level (which is what Edge does).

But it has taken three engineers several hours of work today to diagnose and work around this. Arrrg!

What is also fun is we find Edge appears to do a sort of speculative connection. If it does not have a clean keep-alive session it makes a new connection when it has nothing to say, just in case. This was causing exception handling our side (as we expect a prompt request when we get a connection) which also closed TLS uncleanly and impacted session resumption. We have had to make changes for that too.

The good news, after all that, is we now work with Edge (we already worked with pretty much everything else), so should finally have the new https code release this evening at some point. Watch this space.

I have to say, and this is all down to Cliff, that the https is really surprisingly snappy and responsive. One customer said he could swear it was faster than http, which makes no sense. I am quite impressed.

2018-04-08

JSON vs XML

Recently, I tweeted

Well, I am starting to wonder if JSON is better than XML in some ways now. I have coded a new JSON library for the FireBrick today. It was not hard, in fact, the simplicity does make me wonder if neater in some ways than XML.

Both have a clear formal spec, but what do they have different?

  • XML has all sorts of special cases like CDATA and processing instructions and comments, JSON does not
  • XML does not allow a null character even escaped, JSON allows it
  • XML has all of that pesky namespace stuff. It has its place but for a lot of systems it does not help matters and makes it more complex
  • XML has no concept of even simply types for data, JSON has strings, numbers, boolean, and null as distinct and identifiable types.
  • XML only has objects with attributes and sub objects, JSON has arrays which XML does not.

The JSON library was actually really easy and the syntax if very strict, surprisingly so, to be honest.

So I am leaning towards JSON as being better than XML for now.

What is this all in aid of? Well FireBricks use code we control and we have coded everything from operating system startup to IPSec. So I needed to make a JSON library. There are "standard" open source libraries we could use, but having only taken a day to do this I suspect integrating something in to our build system would have taken longer.

But why do I need JSON all of a sudden? Well ACME uses JSON, and I am working on ACME coding to allow FireBricks to easily have Let's Encrypt certificates for https. So I start with a JSON library.

A good days work I think.

2018-04-06

IANAL

Even though not a lawyer, I do get asked advice some times by friends and family, and with the caveat that I am not a lawyer I sometimes dig out the relevant legislation and provide some wisdom from my experience in life :-)

Of course, I will be interested if my lawyer friends say I have this one wrong, but one of the things that has come from EU membership is some tighter consumer protections.

A key one is "The Consumer Contracts (Information, Cancellation and Additional Charges) Regulations 2013" - I have mentioned it before.

The reason it came up is a friend of mine saw this on a web site when ordering an item from a UK company, as a consumer, so subject to that law.


The clear implication by saying the "insured" option means no loss to you for damage or loss in transit is the converse that if you choose uninsured then you would lose out if damage or loss in transit.

However, that is not something they actually state, it is simply implied, so maybe they are just trying to be cunning to get you to pay the extra for insurance so they don't have to.

The law, in section 43 of that consumer contracts stuff is pretty clear :-

Passing of risk

43.—(1) A sales contract is to be treated as including the following provisions as terms.
(2) The goods remain at the trader’s risk until they come into the physical possession of
(a)the consumer, or
(b)a person identified by the consumer to take possession of the goods.
(3) Paragraph (2) does not apply if the goods are delivered to a carrier who—
(a)is commissioned by the consumer to deliver the goods, and
(b)is not a carrier the trader named as an option for the consumer.
(4) In that case the goods are at the consumer’s risk on and after delivery to the carrier.
(5) Paragraph (4) does not affect any liability of the carrier to the consumer in respect of the goods. 

So, if you use a courier they offer, then the trader has all the risk until it physically arrives in your possession, basically! No need to pay extra for insured courier.

Watch out for that when ordering on-line...

FB2900

Well, Cliff has been working hard on this, and we are expecting to start shipping any day.

This is indeed a FireBrick https access from an iPhone!


Anyone who has alpha code access on their FireBrick (FB2500, FB2700 or FB6000 series) can test https now.

You need to install a key pair and certificate, which could be self signed or (as per my testing) a Let's Encrypt one matching the hostname of my test FireBrick. The plan is that in the following release of code, the FB2900 will be able to do this automatically and use ACME to obtain and maintain a certificate to make it easy.

Email the firebrick testers mailing list with any feedback.

If testing goes well over the next few days we'll be able to announce the FB2900 details and launch.

2018-04-01

FireBrick FB2900

FireBricks have been around nearly two decades now, before things like https were a consideration. Whilst we have embraced IPv6 as part of the design of the current FireBricks from the start, https was not top of the list. Why? Well, the FireBrick web interface is usually only for management of the FireBrick. The idea is that most customers would have it is on a separate management LAN, or locally connected, of even behind an IPsec tunnel (which the FireBrick can do), so https was not actually needed that much.

However, https is more and more a thing and becoming so much the normal way of working (with browsers warning if not https even), so we are including it in the new FB2900, and the existing FB2500, FB2700, and FB6000 series as a free software upgrade.

In fact, working https, with an SSL Labs score of at least "A", is pretty much the reason for the current delay on the FB2900 launch. We have finally sorted the other issues which had added months and months to the launch of the FB2900, but as https is almost ready we are going to ensure the launch has https. It is literally a matter of days away - I have working https on my test FireBrick (SSL labs score "B") even now, thanks to hard work of my colleague Cliff.

We then follow on with ssh, and the plan is ACME support to use Let's Encrypt to make https really easy to install - point a domain/hostname at the brick's IP and bingo, it will be properly certified https. It will still have all of the access controls, but with caveats for ACME certificate renewals. The ACME Let's Encrypt certificates will help with IPsec configurations as well.

Sadly, one of the things we would have loved to do is impossible. We wanted a brick "out of the box" to work with https with no warnings. We could maybe include a cert for my.firebrick.uk or some variant to do this, but any means by which a FireBrick has a private key in the code would mean someone could get a FireBrick and JTAG or some such to extract it from the flash. It would allow that key to be extracted and misused. The only real answer will be for a FireBrick to have a unique key pair and obtain a signed certificate by ACME, or similar, and that can only happen after it has a public hostname and internet connection. So the initial set up will have to be over http or with a "security exception" to talk https. Typically this is literally a laptop connected to the FireBrick, so either is acceptable, but a shame no way to avoid that. It would be interesting to consider the ways embedded devices could solve that within an https and certificate framework one day (TTL 1 and tied to MAC address or something?).

So, FB2900 really close now... Many boxes on the shelves ready to ship... Watch this space!

P.S. I won't bore you with the days of work on the outer packaging shipping label featured in the image above. Lots of svg, barcodes, and postscript and stuff with UPCs and things. All very boring I am sure... :-)

P.P.S. We may forego the "A" rating at launch for the working on all main browsers and not add more delay.

P.P.P.S testers that can load "alpha" releases should hopefully have access to play with this in next day or so.