RevK®'s ramblings: OSPF final stages

2012-11-12

OSPF final stages

Well, I am working on the last part of OSPF now, and possibly the more interesting part. I now have my routes injected in to OSPF as LSAs, and flooded out. What I need to do as the final step is process the Dijkstra shortest path algorithm and inject routes from OSPF LSAs in to my routing table.

But I have to say this protocol is really getting to be a tad special, and my testing has been tricky in some areas. Even after this last stage there is a lot of testing.

A good example is the LSA sequence number. It is 32 bit, and not allowed to change more than once per 5 seconds, so has a time frame of 680 years. If you restart OSPF it starts again, so this means, with no special processing OSPF can run for 680 years.

However, the spec is quite detailed on this. To be fair, this is good - one should specify the edge case, even if unlikely. Ignoring the "I'll be dead by then" cases is what got us in to Y2K in the first place. Though we are talking a single OSPF network running with no breaks for that long, not just some long distant date. That seems a tad less likely.

So, firstly, how would I do it?

A 32 bit counter is fine, maybe start from 0, as that is sane, and maybe make all comparison relative. This means a "circular" comparison. Basically subtract A from B and test top bit (easy to do in C and in assembler). This means that any sequence is seen as "newer" than another if less that 2^31 later in sequence (340 years minimum). This type of logic is common and very easy. When you start a new OSPF instance you start at sequence zero, but in theory you can encounter an "old" entry from a peer, if "older" you ignore (and so do peers), and if "newer" you update your sequence (as the originator of the LSA) to what you see, plus 1. Simples. This start-up logic is in the OSPF spec anyway already.

How does OSPF do it?

For a start they define the 32 bit counter as signed.
They start at 0x80000001 (-ve max, plus 1).
They outlaw 0x80000000.
Comparison is linear, not circular.
If you wrap to 0x80000000 then you have to prematurely age the LSA at 0x7FFFFFFF, flood, and when all acknowledged you replace with a new LSA with 0x80000001 and flood that - thus creating a brief gap without the LSA.

[exclamation mark from end if each of the following points removed so I don't look insane]

OK, well, I have coded this, but really, I was so tempted to say WTF, I will be long dead by then. Does nobody actually think when designing this stuff? Alternative design rules eliminate the complexity and make less code, less testing, less special cases, and more reliability.

Oh, and to add to the fun, OSPF metrics are 24 bit but BGP are 32 bit, so mapping these will be, err, fun!

So, really, really, I plan to have FB2500 and FB2700 alpha releases with OSPF for testing this week, maybe tomorrow or Wednesday. This has taken way longer that BGP even writing the whole routing core as well. I started at 6am today and still working!

3 comments:

admcdMonday, 12 November 2012 at 21:32:00 GMT
The linear sequence space was a very deliberate decision. The use of a circular sequence space was what caused the "ARPANET sequence bug".

There is a nice explanation in Moy's book on OSPF. (Searching for "arpanet sequence bug" in Google Books brings up the relevant page.)

Protocol design is often assumed to be easy, but it isn't. There are many ways to get it wrong.
ReplyDelete
Replies
admcdMonday, 12 November 2012 at 21:49:00 GMT
Just to add to my previous comment: The ARPANET incident is documented more fully in RFC789.
ReplyDelete
Replies

Add comment

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

RevK^®'s ramblings

2012-11-12

OSPF final stages

3 comments:

Faikin board ATE

Rules

Rules

Report Abuse