Friday, 20 November 2015

BGP Blackhole routes

A technical post for a change...

BGP is the protocol that distributes routes around the Internet, and one of the features of BGP is the "community tags" that can be attached to a route announcement.

There are a few that are standard and useful, such as limiting the announcements to the local AS.. Community tags are also often used in networks to tag from where the route came in to the network.

NTT (one of the big transit providers) have a great page on how they use communities, here. They use them not only to identify where routes came in, but also to control how routes are handled in their network.

A community tag is 32 bits and conventionally written as decimal 16 bits, colon, and decimal 16 bits. Where you have an AS number that fits in 16 bits it is common for the first 16 bits to be the AS that defines or uses the tag.

Now, one of the most important community tags you can use is surprisingly not standardised. It is the blackhole tag. The idea is that you can mark a route sent around by BGP that is "Do not route this", and just throw away any traffic to this prefix. The prefix is usually one address (IPv4 /32 or IPv6 /128).

There are two key ways an ISP can use Blackhole routes...

One is within their network, ensuring that their IBGP spreads the route and tags it so that each and every one of their routers knows not to route any traffic for the specified prefix. This helps ensure packets arriving at any ingress are dropped immediately to mitigate damage. It does not help much if the ingress is flooded though.

The other is for an ISP to tag the route and announce to their peers, and transit, so that they do the same. This helps avoid flooding the ingress points as the peer/transit is filtering in their network.

This is all quite important for managing Denial Of Service (DOS) attacks. Even if the target is one IP, which is not always the case, the traffic can be crippling. So an ISP that can tell their peers and upstream transit providers not to send the traffic to them, for that one IP, can stay on-line. The transit provider can spread this to all of their ingress points ensuring their network is not flooded further, and maybe even to their peers to push back further to the source of the traffic.

Over the last few days, for reasons that will be obvious if you have followed A&A status pages, I have been working on ways to make FireBricks smarter in their handling of Blackhole routes.

I could leave it to FireBrick customers, making rules to handle the way community tags are processed, but even that did not allow a route to be treated as a black hole, just drop it. So what I did is create ingress and egress blackhole community tag handling.

Anyone sending us traffic with a specific community (for A&A it is 20712:666) has the route treated as a blackhole route. Obviously the route has to pass any other input filters, so customers can only announce their own IPs to us. This route spreads around our network so every router knows it is a black hole.

Secondly, announcing any blackhole route to peers is special. We only send on IBGP (ensuring our black hole community tag is present), or if configured we send on EBGP with the peers black hole community tag, such as 2914:666 for NTT.

This means that anywhere in our network, even from a customer, we can create a blackhole route, and our whole network knows - all routers will drop traffic to the target IP. It also means we then tell all transit and peers that have blackhole community tags to do the same. Obviously if we can do this at peering points as well as transit then it is a massive help.

We have even made a system so a "connected" DSL line that is subject to a DOS attack can be marked as blackhole routed and that route go around our network and to peers and transit for a few minutes to mitigate the attack automatically!

Of course, there are attacks this will, by no means, fix. All it means is that an ISP is better able to partition out IPs as under attack and help avoid impact on other customers. As a feature it is good for FireBrick to be able to offer this to ISPs.

What is odd is that there is not a pre-defined standard blackhole community tag.

P.S. Every one of the millions of DOS attack packets per second will probably need to create an "Internet Connection Record" under the new IPBill so will mean DOSing the DPI boxes the government want installed.

12 comments:

  1. There is a draft standard, draft-ymbk-grow- blackholing-01, well known community 65535:666.

    ReplyDelete
  2. Is it possible that the black hole route mechanism could itself be used as a DoS attack on a particular user?

    ReplyDelete
    Replies
    1. No more than BGP can be generally. E.g if you trust a peer to send I such a route they can either take all the traffic or black hole it.

      Delete
  3. Great, is there any documentation on the IP address of the iBGP peers we should use to announce a blackhole? bottomless.aa.net.uk by chance?

    ReplyDelete
    Replies
    1. Err, customers that we provide BGP to (a handful) can include 20172:666 community for blackholes. This is mostly some peers and customers with direct Ethernet connections in to us. We don't do BGP with broadband customers normally.

      Delete
  4. RevK, any plans to publish the source of the Firebrick firmware? It's my understanding that you created the firebrick because you couldn't trust the reliability or security of other, closed-source routers, but without source, your customers are in exactly the same position you were when you started the whole exercise.

    ReplyDelete
    Replies
    1. This has been the subject of a long debate - even publishing the source, you cannot be sure that what we put in FireBricks is the same as the source we published. To do that you would have to be able to build and sign code yourself and install on your FireBrick. Even then, you cannot trust that our boot loader code does not install some sort of shim in the Ethernet interrupt handling that does something underhand - you'd have to be able to build the boot loader and install via JTAG. Even then you are trusting (as we do) the chipset manufacturer (which may not be that stupid). At the end of the day you basically have to trust us to some extent, and if you don't then providing the source does not really solve that. We sign all code and only allow upload of signed code in to the brick, so this give customers some security that the code has not been messed with by someone else, which we think is probably for the best at the moment.

      Delete
    2. It is not out of the question that some code may be made open source. We already publish the code used in the FireBrick to generate the graphs in png, for example.

      Delete
    3. That's true, but there's more than one threat vector here:
      1. Someone at A&A with access to the baseband signing and verification level code is malicious.
      2. Someone at A&A who can contribute to the code but can't compromise the base system is malicious.
      3. Your code has bugs in it

      Nothing can defend against 1, but it doesn't have to - if we don't trust you not to be malicious, we shouldn't buy your hardware, full stop. #2 seems unlikely, but possible, and #3 is a virtual certainty. Publishing source mitigates both 2 and 3, by allowing people to inspect the code running on their router.

      At the least, I'd hope you recognize that for anyone with the same reservations as you about routers, the firebrick is no more a solution than any other offering. Without the source, the only person who can impute more trust to the firebrick than to other routers is you.

      Delete
    4. Indeed - 1 and 2 are unlikely as we have very few people who work on the code and can contribute, and very tight control on who can sign code. As you say 3 is a certainty. That is why, as I say, some code may get released in time. I do recognise the issue though, however, as a small UK based development team, I think it may be easier to trust us than a large foreign faceless corporation - buy YMMV.

      Delete
    5. What's the major draw back of publishing the source code?

      Competitive advantage? Since you're writing code to standards is there much unique code that competitors could take advantage of?

      Delete
    6. Even though we are working to standards in terms of the way we interact with other systems, there are quite a few aspects of the Firebrick that work in some unique ways. So yes, competitive advantage, sorry.

      Delete