RevK®'s ramblings: CISCO and ARP?

2016-03-25

CISCO and ARP?

The FireBrick has quite a good ARP handling subsystem, including exponential back-off, configurable ARP timeouts and so on. It has served us well, but we have recently encountered a slight problem talking to a CISCO Nexus switch.

So I did some tests - and would love to know if this is typical. Any CISCO experts reading this may be able to comment.

Testing using arping from linux, I could see that the CISCO would respond to only some of my ARP requests. Maybe one in five, but not very consistent. This is a tad odd, and may be down to some general ARP rate limiting perhaps.

On top of that, when it did respond, it did so after 2.99 seconds. This was very consistent - I had to use arping one ARP request at a time to confirm this.

I have to wonder what the hell it is doing! From a coding point of view, holding on to the ARP request or reply for that length of time is more work than just answering the ARP right away. I am at a loss as to what is going on.

For comparison, a FireBrick is timed by linux at 180us response and answered every ARP.

Anyway, it means I have had to tweak the way the ARP system renews ARPs to try a bit longer, otherwise every now and then the CISCO vanishes for a few seconds.

Oh, and yes, they still look like this with some arbitrary padding to min packet size for Ethernet.

09:40:20.688429 ARP, Request who-has 91.240.176.1 tell 91.240.176.254, length 46

 0x0000:  0001 0800 0604 0001 0003 971d c009 5bf0  ..............[.

 0x0010:  b0fe 0000 0000 0000 5bf0 b001 474e 5520  ........[...GNU.

 0x0020:  5465 7272 7950 7261 7463 6865 7474       TerryPratchett

P.S. It was CoPP, but we don't understand why it would delay ARPs 3s in that process.

8 comments:

UnknownFriday 25 March 2016 at 11:31:00 GMT
CoPP?
ReplyDelete
Replies
NiallFriday 25 March 2016 at 11:45:00 GMT
Is dynamic ARP inspection enabled on the Nexus?
ReplyDelete
Replies
ChrisFriday 25 March 2016 at 14:27:00 GMT
The specific model of nexus would be useful, features vary across the range, as Edward suggested CoPP - control plane policing - this allows you to rate limit traffic to the control plane, potentially including ARP. There may also be specific hardware rate-limiters. ARP would be considered a lower priority task, so I would also check to see if the CPU on the nexus is running high, perhaps your arp issue is a side effect of another issue. Dynamic ARP inspection wouldn't be my first thought, unless DHCP assigned addresses are in use.
ReplyDelete
Replies
MikeMonday 28 March 2016 at 13:21:00 BST
ARP a "lower priority task" ?? Since no IP traffic can pass until that transaction can complete, I find that an interesting design decision.
ReplyDelete
Replies
Alexis ThrelfallTuesday 29 March 2016 at 11:44:00 BST
Sorry, the sigmonster burped this out shortly after reading this article, so I felt I should share;
I know that it seems strange about the nat thing, but the explanations from cisco are very similar to the explanations given by my girlfriend. Neither make any sense, up means down and no means yes more often than not - but you can never be certain until you try and fail a few times and only once in a while you get lucky and it works.
ReplyDelete
Replies