RevK®'s ramblings: BGP

Border Gateway Protocol is a thing that happens very much behind the scenes in the Internet and not something anyone outside the industry should have to know anything about. So this post is going to try and really dumb down some of the technical issues.

Firstly I'll try and explain what BGP is, and a couple of the challenges that have come up over the years. Some were an urgent issue that made us all realise a risk that was not known before. The others are more of a gradual change in best practice that needs doing, in my view.

Extra dumbed down

For the Internet to work - there has to be a "road map" so that your Internet provider can direct traffic.
Roads change, and so there is a way to update this road map with new instructions (that's BGP).
There can be errors in these instructions, and bad people can give wrong instructions.
This is all something that is being worked on. It is complicated.

Thanks to Simon Crowe #FBPE@UHDDreamer for some inspiration on the above.

What is BGP

First off, BGP is the way internet providers manage routing of internet packets. It involves, normally, an ISP communicating with another ISP over some link to say what they can route for them.

It is not fundamentally complicated, and I recall one occasion talking to someone working for a major peering point about us plugging in to them. We (AAISP) use FireBrick routers, and I had personally written the ethernet drivers, IP, TCP, and BGP protocols from scratch for our equipment. We plugged it in and it worked as expected. The idea that we were not using CISCO, or Juniper, or some other common vendor, was a shock at first, but bear in mind that not only are these all well defined and published standard protocols, they are designed to allow a degree of tolerance to errors.

Our code worked as designed and to the standard, and in some ways was way faster than some vendors. I was pretty proud of the design.

For anyone to be able to take part in BGP you need to agree with another ISP, over something called "peering" which is ISP to ISP, or "transit" which is where an ISP gets "the Internet" from some larger company. In both cases, and especially the latter, there are filters on what you can "announce" to the world via BGP

As a system, this should work. I cannot "announce" someone else's routes to transit - they won't let me. I cannot "pretend" to be some part of Facebook's network, for example, and hijack their traffic. If everyone that allows and connection to the BGP network had such filters all would be well, and mostly it is.

This issue is that some parts of the world are not as robust, and so rogue routes can be announced. It can be (and often is) a mistake, or it can be malicious. Hijacking someone's routes can be a way to break security (getting new certificates for https), or just causing disruption to their network and traffic. It is a concern for the industry as a whole.

Just to explain, unlike your broadband router which has only one route to the internet, your ISP has many peers and transit and routes to send data. That is why BGP is needed in the first place.

Path overload issue

One incident that happened was where someone made a simple typo on a configuration (more details here). A setting which they thought was a number to quote was in fact how many times to quote it. This created a message in the routing which was unexpectedly long and caused some special edge case in the code for longer data.

The problem was a bug in some makes of router which meant that it could not cope, and broke routing. It created invalid data that was sent on.

Now, I hate to say this, but my memory is sketchy on this, but the solution was to not forward invalid data. We realised this and ensured FireBricks would not do so (a config setting with a default not to, called "ignore-bad-optional-partial") even though the specification said we should. We had new code within days to ensure FireBricks could not be part of the problem - even before the RFC (standard) on this was created.

Some times the industry has to act quickly as even though the cause of the issue was a mistake, it could be exploited as an attack.

TCP RST issue

Another issue that became apparent was the way the BGP links between ISPs are set up. They use a normal TCP connection. Now TCP works on IP and IP has a "TTL" or "hop count" which stops IP packets going to far. A convention in BGP (not part of the spec for BGP or TCP) was to set up TCP session to "peer" links with a one hop TTL. This means the TCP connection cannot got more than one hop to the directly connected router. This makes sense as the peer is directly connected on a link one hop away.

The problem that came up was that someone could inject a TCP packet called a RST, with a faked source address, sent on the Internet, which when it arrives closes the TCP connection for the BGP session itself. This drops all routing, and causes disruption. Repeatedly done it can take down a link, or set of links, completely and cause a lot of problems.

The first fix was a way to digitally sign the TCP packets. We, at FireBrick created this feature to allow it to work for BGP, and a few of AAISPs peers required signed BGP sessions. This works using a password at a low level and so ignoring the rouge RST packet.

It turns out there is a way simpler way to fix the issues called "TTL security". Instead of using a hop count of 1, use a hop count of (maximum) 255, but make sure the peer checks it is 255. The reason this works is a packet from anywhere else on the Internet will see this hop count go below 255 as it drops at each "hop" on the way.

Again, FireBrick implemented TTL security, not just setting the required hop count, but checking it based on number of hops allowed/expected (usually no intermediate hops).

Using RPKI

There are still issues with BGP, even with all of these steps.

The main one is that someone can "inject" a route in to the system that is not genuine. They can do so alongside the genuine route, or inject a more specific route. This hijacks all of the traffic.

As I said before, where transit providers check their customer routes, this cannot happen. But some countries are a bit more lax.

The "fix" is double edged - it involves a way to certify that a route is correct, specifically that it is to the right "autonomous system". But the downside is that puts someone in control of certifying the route is correct. Who has that power?

This was a controversial issue in that, for example, the whole of Europe is controlled by RIPE. So if a Dutch court demand RIPE remove a route, they would have to. This puts huge power in the Dutch courts. The same applies in the US and every other registry where a local court could command a change. To be clear, actual routing is handled by the ISPs, but the issue comes when they all work on one authority as to what is valid. I am not sure that has now been fixed in RPKI, but happy to be corrected on this point.

The other issues is that certification can lead to mistakes, causing routes not to work based on some technicality.

Not everyone is checking these certificates, and even then the system will not be bullet proof if the origin AS is spoofed (I think). So any errors will cause partial failures. These are massively difficult to diagnose. What does an ISP do when just some of the Internet cannot see some of its network? In most cases the networks not routing will have no contract or direct relationship with the ISP in question. That is hard to diagnose and fix.

In the long term, this is generally good. Even with the risk of a court attack, the industry can work around if needed. That is a last resort, and measures to avoid rogue routes are a good idea.

If the major transit providers start filtering routes checking RPKI then that alone will solve the problem of rouge routes - but if they all filtering what they receive anyway from customers, that would avoid the issue without RPKI. So is it worth it?

But as I say, this is all behind the scenes policy and technical issues for ISPs and transit providers. It will be sorted by ISPs and industry as a whole around the world. We are all working to improve the security and reliability of the Internet.

Who should do what and when?

[new section after original post] I have been learning more on the whole RPKI thing. Overall it is a good idea as it blocks some types of attack. It is not perfect, it does not block all types of attack, and is, itself, prone to new types of attack via courts and also new mistakes, but it helps. It helps a lot with some types of mistakes, which have been a cause of issues as you can see above. It is best practice, which is important. So that is why we (AAISP) are doing it.

There seem to be three steps that make this work.

Everyone should be signing their routes - i.e. ensuring they have signed route details saying which routes via which AS, so they can be checked by the Internet as a whole. AAISP have signed routes for some time and are currently working on ensuring some hosted routes for customers are also signed. This is the first step, else RPKI could not work at all - you cannot check routes if you have nothing against which to check them.
The big players, the transit providers, need to filter based on RPKI. This, with step 1, basically stops all route injection attacks in their tracks, and problem solved.
Smaller edge ISPs should also filter routes. This is mainly to catch the peering sessions and pick up mistakes. If transit are filtering, this is a mopping up exercise - an attack or mistake could impact a small group of peering ISPs maybe, not the Internet as a whole. Such ISPs probably already filter peering to some extent anyway, but RPKI is a good start for making this better and more automated.

So if you felt things in the industry were not moving fast enough, you could make a site and allow tweets saying people have "unsafe" Internet. But if you did that, should you say that the edge ISP has unsafe internet or maybe work out which transit they are using and say that transit provider is unsafe? Maybe if the edge ISP is not signing routes, highlight that. But really, who should you "shame". The edge ISPs filtering is a good idea, but the last steps involved for completeness - the signing and the transit filters, they are what matter here. Personally, don't try shaming people, talk to them!

But to be clear, AAISP were doing 1, we are now nagging transit re 2, and we are working on 3. The last stage is complex as it means development and testing in our core routers - not something you do during a pandemic.

It is interesting that, even with recent publicity, we have one customer concerned that we will be deploying RPKI filtering - feeling it will break things and even accusing us of breach of contract. This kind of shows it is not a simple matter to deploy quickly.

There is also an excellent post by Andrew Aston on the issue of shaming ISPs: here.

6 comments:

Terry F.Sunday, 19 April 2020 at 19:48:00 BST
Regarding path overload incidents:

There have been a few over the years but the ones that stick in my mind are https://dyn.com/wp-content/uploads/2013/05/nanog54-cowie.pdf and https://dyn.com/blog/the-flap-heard-around-the-world/

Primarily caused by folk not reading the MikroTik RouterOS documentation properly:

https://wiki.mikrotik.com/wiki/Manual:Routing/Routing_filters

The parameter I believe you are thinking of is 'set-bgp-prepend'.
chrislMonday, 20 April 2020 at 00:20:00 BST
"Path overload issue [struggling to find the link to history on this - will update if anyone has details]"

Was it this? https://dyn.com/blog/longer-is-not-better/
Terry F.Tuesday, 21 April 2020 at 22:17:00 BST
I think I need to balance what I have said publicly with what you have written here.

There are idiots out there on the Internet who do not publish valid route/route6 objects in the relevant IRR data sources and then when this is pointed out to them by the customer who has that prefix assigned to them; they need to be spoon-fed the relevant text to submit to the RIPE database in order to create the necessary object in order to be able to use the prefix that they assigned to their own customer.

I have seen at least one similar error with RPKI; if you change the origin AS for a prefix, you need to publish an additional ROA, update your existing one or simply revoke them all in order to ensure global reachability of that prefix.

I think Cloudflare would have put their point across better if they had advertised two 'invalid' prefixes with one prefix testing for those who had implementing strict IRR-based/AS-path filtering and the other prefix for those who had implemented RPKI - there are multiple grades of 'safe' here - and those ISPs who haven't even bothered to implement IRR-based filtering should rightfully be called out for it as that has been The Right Way(tm) to do it for many years prior to RPKI even being on the drawing board.

My concern with any provider manually blocking a prefix is made nicely by your own marketing at https://www.aa.net.uk/broadband/real-internet/ headed "Why are we against censorship?":

"Censorship of any sort is the thin end of the wedge and must not be taken lightly."

The address space is Cloudflare, being advertised by Cloudflare and content is being served by Cloudflare hosts from that space.

One thing that Cloudflare could easily do to combat those who manually block their 'invalid' prefix is to periodically cycle what prefix is used for this purpose very much like how spammers frequently switch source addresses/netblocks to ensure they don't trip rate-limits imposed on them by others.

Then it becomes a game of whack-a-mole... which can become even more embarrassing for an ISP that promotes an anti-censorship stance if/when Cloudflare start serving legitimate content from one of the manually-blocked prefixes which is no longer 'invalid' as far as RPKI is concerned due to publication of a valid ROA.

No ISP can keep up with a game like that until they actually implement RPKI.

I guess that the moral of the story is that manually-blocked prefixes (sometimes even bogons) almost always have a habit of coming back and biting you when you least expect it.

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

RevK^®'s ramblings

2020-04-19

BGP