2018-04-19

Learning to ACME

[Technical]

I thought it may be fun to explain what I have been doing over the last week in more technical detail... I have been coding ACME (which is the protocol to get a certificate issued for a domain) for FireBricks. It is aimed at making HTTPS set up really easy. Right now you have to install keys and a certificate manually, but ACME will make it simple and seamless.

JSON

The obvious first step is that the protocol talks to the ACME server using JSON. We send JSON objects and receive JSON objects back. It is all done over https to the ACME server.

I commented on JSON recently, and even with years of experience using XML, and in some cases converting XML to/from JSON to work with Javascript, I am thinking JSON is not bad, and seems quite well suited to this job.

Not the first time I have handled JSON, but I needed a JSON library for the FireBrick, which only took a few hours.

JWS

However, the JSON that we send to the ACME server is not just a simple JSON object, oh no. It is a JSON Web Signature protocol. This means you make some JSON, you then BASE64 code the JSON, and make that a "payload" field in an new JSON object. You also make some more JSON which is various fields defining the public key you are using. These fields (e.g. "e" and "n" for RSA) are BASE64 encoded. That chunk of JSON is then BASE64 encoded, and included as a "protected" field in JSON. Then a signature is made, BASE64 encoded and added as a "signature" field. So what you post is a JSON object with three BASE64 encoded fields, two of which are BASE64 of other JSON objects. Yes, I know, complicated, but I got all that working. Thankfully the reply is just a JSON object as normal.

BASE64 but not quite normal BASE64

Another fun detail is all of the BASE64 used is not normal BASE64, which is A-Za-z0-9+/ but a URL safe BASE64 which is A-Za-z0-9-_ instead. So even simple debugging using base64 command line tools on linux often failed. Also, normal BASE64 pads with = at the end, but this is all unpadded. I never really understood why padding is used anyway, so quite on board with that one. Fortunately BASE64 is a doddle.

Not just JSON

A somewhat frustrating part of the API is that it is not just about sending and receiving JSON objects. If only!

No, some of the key data you need is in the HTTP headers. Some of this is as per HTTP spec, but they did not have to do it that way - they could simply have sent all of the data you need within the JSON objects, or even duplicated in to JSON objects if they wanted. So you cannot use a simple HTTPS client to get a response, like curl, you have to also get selected header values as well. In some cases more than one such header.

So, my client library was updated to allow selected header extraction.

Replay-Nonce

There is also a special field, a nonce, a code issued by the server which you have to send in the next message you post. This is one of those header fields, but only if you POST something, not if you just GET, so you only grab this header on some of the interactions not all (arg!). You then use this in the JSON (not as a header!!) when you post the next item. This is all good in that it stops someone capturing an interaction and replaying it for their own use, but it is annoyingly inconsistent, header one way, JSON the other, for example. It is, however, included in what is signed in the JSON to avoid tampering.

JWK Thumbprint

This is special. It is a hash, BASE64 encoded, of a chunk of JSON which holds the public key (JWK). You send exactly this JSON as part of the ACME messages (in the "protected" part). It is also part of the response the web server has to send when challenged. You have to prove you own the domain by making the web server respond to an HTTP request with a specific value.

What puzzles me why not simple send a nice random string as part of the ACME protocol and expect me to respond with that?

But, no, we have to make this Thumbprint. However, this is where it gets a tad special. First off, the JSON has to be exactly right, with the exact fields you need in exactly the right order and no whitespace. If not, then the signature does match and all you know is it does not match!

Now, this is not a question of using the same JWK you sent in the ACME messages, no. They can be fields in any order, for example, and work. No, it has to be exactly right. However, the ACME accepts  it in that format so I can use one function to make it.

But it gets worse. The public key includes the "mod" value, which is a long string of bytes BASE64 encoded. A small note mentions that any leading zero bytes must be stripped. This is not needed for the ACME messages in JWS to work, but if you don't do it, you get a different JWK Thumbprint and so nothing works. It is not even quite what you might do in ASN.1 as the next byte has to not have the top bit set else you indicate the field is negative. This case is simply strip leading zero bytes. That took me hours of testing, comparing to examples, and re-reading the spec.

I am still quite surprised it is not simply some random string provided by the ACME message for the challenge.

Certificate Signing Request and ASN.1

Having got through the challenges and got as far as an authorised order I can send a final request with a CSR and get a certificate. yay!

But I have to make a CSR. So far the FireBrick code has has to decode ASN.1 for certificates and so on, but not generate much ASN.1 (SNMP is somewhat simplified in that area).

So, another couple of hours making an ASN.1 construction library, and then working out what goes in to a CSR. Thankfully tools like openssl will parse what I make at an ASN.1 and CSR level to tell me what I have.

ASN.1 is a bit like riding a bike. Every time you work on it, it all sort of comes back to you...

I am also really impressed with the Let's Encrypt staging server in terms of the error messages it returns. They tell me exactly what I have wrong.

It turns out the certificate only needs the common name, which makes sense as LE only sign that as that is all they have proved, so no need for company and locality and all that.

I was quite chuffed that the first attempt to make a signed CSR just worked, I got the signing right. That is rare in coding.

Two key pairs

So, I finally have a valid and signed CSR, and send that, and get an error telling me the key used for the "account" (all the messages to/from the ACME server, and for the JWK Thumbprint) must be different to the key for the domain (i.e. in the CSR).

So now I have to faff with a second set of keys and make sure they are used in the right place.

Finally

Finally we get the certificate and install as normal. Actually, for Let's Encrypt it is two certificate as they have an intermediary one as well.

Testing on a new box, I added a hostname to the config, and 4 seconds later we had working https using that hostname. That is how simple it should be :-)

Next

I have a lot of tidying to do, and we need to make this a bit more polished before a release of FireBrick with this in place.

One idea is handling more than one hostname. I think this will be less common, and originally we thought we would get one certificate with "alt" names on it. However that does leak all of the other names for a brick if you access one. So plan is separately getting a certificate for each, and probably a status page showing progress, and expiry and so on.

To be fair, the host names used with Let's Encrypt are published anyway, which may be an issue for some. But ACME should work with other CAs, though we may have to add extra fields if someone wants to do that.

There are also access control issues over HTTP access during the authentication stage which needs allowing TCP port 80 automatically, even if only for a few seconds, and also being locked down to just the ACME authentication and no other access via that. Not hard, but needs doing with option to turn off.

So, maybe next week we will have alpha releases for people to test.

P.S. Some work over weekend - much more polished, and much better error reporting. Really close to an alpha for customers to test now.

18 comments:

  1. > It turns out the certificate only needs the common name, which makes sense as LE only sign that as that is all they have proved, so no need for company and locality and all that.

    Actually, it only needs the Subject Alternative Name, the Common Name is deprecated.

    ReplyDelete
    Replies
    1. Not only deprecated. I recall some internal testing certificates I had previously generated with only Common Name broke one day on some browser that had just been updated (can't remember which browser, sadly, but it would have been either Firefox, Chrome or Safari). I had to regenerate it with Subject Alternative Name to get it to work.

      Delete
    2. Yes, the abuse of X.509's Common Name to write names from the Internet was deprecated _last century_

      For many years the CA/B Baseline Requirements (rules the Certificate Authorities agree to in order to get trusted by the famous web browsers) said all certificates must list Subject Alternative Names, and if CN is used it must be taken from one of the SANs. CAs that ignored this rule have gradually been getting spanked for it, and compliance eventually reached the point where the next step was possible:

      A few years back some major browsers like Firefox and Chrome changed their algorithm so that they would only try to interpret CN if the certificate entirely lacked Subject Alternative Names. Last year, they just stopped parsing CN regardless, at least for certificates from the Web PKI (ones the browser trusts out of the box)

      Delete
  2. > Testing on a new box, I added a hostname to the config, and 4 seconds later we had working https using that hostname. That is how simple it should be :-)

    Presumably at some point in the flow (when the firebrick is creating the 'account' with Lets Encrypt) you prompt the user to agree to their terms (and provide an email address for notifications)?

    > To be fair, the host names used with Let's Encrypt are published anyway, which may be an issue for some. But ACME should work with other CAs, though we may have to add extra fields if someone wants to do that.

    ALL certificates will have to be published to certificate transparency logs (and prove that they have been logged using SCT) to be trusted by Chrome as of the end of this month.

    https://groups.google.com/a/chromium.org/forum/#!topic/ct-policy/wHILiYf31DE

    ReplyDelete
    Replies
    1. An interesting point - we may have to add something for that (terms). At the moment there is not a means to do that (other than saying in the manual only do this if you agree the terms), but if I do a status page it could well stop at that point and need a confirmation... As I say, some tidying up still to do.

      Delete
  3. That sounds like a really *satisfying* bit of coding.

    Although an extreme case of, "Do not talk to me on pain of pain".

    ReplyDelete
    Replies
    1. The usual ups and downs of coding and moments of banging head against wall, but yes, very satisfying. Now I have the tedious "polishing" to do to make it pretty...

      Delete
  4. How are you handling the 90 day renewal required?

    ReplyDelete
    Replies
    1. Configured number of days ahead (default 30) it does the renewal automatically.

      Delete
  5. Is this using ACMEv2 ?

    https://community.letsencrypt.org/t/staging-endpoint-for-acme-v2/49605

    ReplyDelete
  6. "It turns out the certificate only needs the common name, which makes sense as LE only sign that as that is all they have proved"

    Technically, Let's Encrypt cares only about the SAN dnsNames. Because common tools like openssl make generating SAN dnsName CSRs tortuous, they generously parse a CN, figure out if that's a plausible DNS name and if so the certificate is issued for that SAN dnsName.

    "I am still quite surprised it is not simply some random string provided by the ACME message for the challenge."

    The ACME design binds all such requests to your specific ACME account via its public key. One neat thing you can do about this is that you can configure a web server to answer all ACME http-01 connections with a "correct" answer for your account without it having access to the ACME account keys. So long as you keep control of your account keys, nobody else can abuse this, and you don't need to weld the web server and ACME client software together to make it work.

    ReplyDelete
  7. It's over a decade since I did any ASN.1. I started with it way back in 1987 as part of an X.400 email system for Data General. The only thing out of that lot still going is ASN.1 itself. Makes me feel old thinking about it all.

    ReplyDelete
  8. Does ACME work only when the brick has the web interface available on 80/443?

    ReplyDelete
    Replies
    1. It needs the CA to be able to get to port 80 on the hostname. But the FireBrick can be locked down on web interface with allow IP lists and local-only controls and even use other ports for http/https or be https only. The code can open port 80 just for the few seconds for the authentication and only for the authentication URL and not any other access.

      Delete
  9. This is a nice blog but I question why you chose to reimplement the wheel. The FireBrick is very nice but might have been cheaper/quicker to market if you just used the well-tested off the shelf libraries!

    ReplyDelete
    Replies
    1. And the same as every other device on the market. When we started there were not suitable small firewall/routers for the new ADSL services.

      Delete
  10. Pretty sure the firebrick USP is not that it uses custom code, more that it works and is reliable with UK support.

    As it is, it's not particularly great due to the home-rolled security code, and is quite expensive. That limits the market to the non-price-sensitive folks.

    Would have made more sense to me to pull stock router hardware, marry it to an OTS battery backup board with minimal custom development for the DC input (48V input might have been pretty cheap!), and only focus dev time toward the unique DSL software features, but I guess I don't run FireBrick!

    ReplyDelete