Wednesday, 7 November 2012

OSPF blues

Making progress. I have (subject to lots of testing) reached a major milestone at last, which is that we now manage the LSA database.

This means discovering neighbours (Hello packets), electing the designated and backup router, starting associations with other routers correctly, exchanging LSA headers, loading the database, accepting updates and sending acks, sending flooding updates, originating router and network LSAs when needed, refreshing LSAs we originate, and generally being a part of an OSPF community.

There will be bugs! I have not tried area-border stuff at all. But this is, none the less, a major milestone. Only two key steps left: Injecting routes in to OSPF, and processing routes from OSPF (i.e. doing the SPF algorithm, etc).

What I am working on this morning, and have a lot of progress on for IPv4 at least, is injecting "external" LSAs. i.e. where the FireBrick has some external routes and wants to tell the OSPF world about them. This is carefully controlled - you don't want to leak the full table of BGP in to OSPF it seems. This will mostly be used for L2TP connected routes.

When coding, you have to cover every possible outcome in the code, so you find edge cases that should never happen and you have to cater for them. Sadly I have run in to a snag already.

In OSPFv2 the LSA-ID for an external route is the IP address of the route, but it is allowed to use the "host bits" to differentiate LSAs that overlap. e.g. if you have and they cannot both use as the LSA-ID. There is an algorithm to try and sort clashes, which is itself messy as it means changing previously announced routes when adding a new one that overlaps.

The snag is that this does not work for all cases. e.g. (and we'll test this on bird later), what if I have and and There are only two LSA-IDs that could be used, and but I have three routes to announce.

Now, one can consider that the is not needed as covered completely by the two /32s. That is fine, until one of the /32s is withdrawn. Now you have to reinstate the /31 covering route. This means every time you withdraw a route you have to check for all containing routes that may have not been announced just in case. Arrrrg!

So, once again, OSPF is piling more special cases in my code, which is more to go wrong, and more to test.

Some protocols and algorithms are "elegant" and "simple". I really do start to feel that OSPF is not either of these. Sorry.

P.S. I wonder what bird does with as the LSA-ID is reserved for

Update: OK, coding for IPv4 (OSPFv2) working, picking an LSA-ID and handling clashes by removing the covering route where all subordinate space is occupied, and re-instating it when there is a gap. We are treating the same, so that means we will not use for default route if there is a route as well, but that is not exactly likely. Now to originate IPv6 routes and that will be next milestone!


  1. Which is why we... and A&A... do this kind of thing with BGP rather than OSPF.

    OSPF had me tearing my hair out on more than one occasion and that was simply dealing with the bugs in other implementations; writing your own is likely to have you completely bald by the time I see you next :-P

  2. The underlying algorithm (Dijkstra's shortest path algorithm) is very elegant. The protocol on top shows lots of signs of being badly hacked up based on IS-IS without a huge amount of thought about how OSI networking and IP networking different.

  3. I would think that IPv6 would be much easier as there are very very few valid subnets?

    1. Err, even just considering /64 subnets there are more possible routes than the 32 bit LSA-ID field can take :-) What makes IPv6 easier is the LSA-ID is not tied to the IP address/prefix in the same way as IPv4, so can just be a sequence number. I have IPv6 routes going in to OSPF now, and need to do loads of testing before finally taking routing data from OSPF and injecting in to the internal routing table. So getting close!