5

I have an application that is running a raw IP socket, destination of this socket is governed by routes installed via the 'ip route add' command. These routes can change during the lifetime of a socket (e.g. because next-hop changes)

Simplified, lets say I have 2 interfaces, eth0 and eth1. I also have a default route via eth0.

The raw socket's endpoint is for example 10.10.10.10, eth1 has address 100.0.0.1, I do the following during the raw socket's lifetime:

ip -f inet route delete 10.10.10.10
ip -f inet route add 100.0.0.2 dev eth1
ip -f inet route add 10.10.10.10/32 via 100.0.0.2 dev eth1

Now what I see is that after this operation traffic goes correctly via eth1 for a few seconds, then it goes wrong (via eth0) for a short while (less than half a second) and then it's correct again (as far as I can see permanently).

So my main question is: -Can anybody give an explanation of what might go wrong here? I tried adding ip route flush cache after the sequence mentioned before but that didn't do anything. I'm currently puzzled as why traffic sometimes gets dropped. I think it's either a timing issue in the routing commands or some other trigger disabling the route for a split second, but I'm running out of options.

I did try to use the SO_BINDTODEVICE option on my raw socket, but alas this didn't help much, the main difference is that when traffic goes wrong it isn't sent out at all, because it would go over the wrong interface. However, what I hoped for was that this would set errno to something like E_CANNOTROUTE (this doesn't exist) so I could catch this and retry sending the packet. It currently does not do this, but is there a way I could catch such a failure? I have (almost) full control over the system and the application that runs the socket.

One solution I know would work would be to not use L3 raw sockets but AF_PACKET sockets (and also do ARP/ND myself) but I'd rather not go into that just yet.

Update

I have improved behavior in my system, by changing this route change behavior. When I have to update the next-hop I now look to the already installed route and take action based on that:

  • If it's not there I just install the new route and skip the delete.
  • If the exact route is already present (same nh, same dev), I now do nothing.
  • If another nh is present for this route I now do a more specific delete for just this nh followed by an add.

While this stabilized most of my issues, I still sometimes see the same thing happen (though much less often) when an actual delete+add happens (last case in the new mechanism). Also, this actually still does not explain what goes wrong (it merely circumvents it), so I'll leave this question open for now as I'm really curious what goes wrong here.

FYI: I have the issue on centos, as far as I can see going from centos4 to centos6, 32-bit.

KillianDS
  • 151
  • 4
  • Do your raw packets sequence in any way? Can you be sure that some of the packets you're seeing for less than half a second just aren't those packets that needing dequeueing on the old interface before the route changed? – Matthew Ife Dec 28 '13 at 17:57

2 Answers2

0

If I understand correctly, the packets should always go out eth1, and your problem is that when updating to a new nexthop on eth1 your packets sometimes go out eth0? That would be because your delete+add is not an atomic operation.

Try doing the add first, followed by the delete. The delete has to be specific (with the device and nexthop I believe) so that it won't also delete the new route you just added.

Law29
  • 3,557
  • 1
  • 16
  • 28
0

Is there a default route (or other route covering 10.10.10.10/32) via eth0? If you're deleting first and then adding, you might have a race condition where the delete happens, packets go out the default route during the time between delete and add, then the add happens and packets start going where you expect.

It definitely sounds like some form of race condition to me, most likely due to the non-atomic nature of the two routing operations you mentioned (as stated by Law29).