1

I am building a client/server application that uses several network interfaces in parallel for redundancy, and I have noticed that while one network interface goes down or goes up, the communication on other interfaces hangs for several seconds.

I could reproduce this behavior without my application in a simple way:

  • there are 2 interfaces available on computer 1 ( Ethernet and WiFi )
  • ping from computer 2 the IP address of the Ethernet connection of computer 1
  • disconnect the WiFi of computer 1
  • ping hangs for seconds, and then the packets are traveling again between the 2 computers.

The hanging happens as well if I turn back on the WiFi connection on computer 1. It happens as well if I ping the WiFi IP, and turn off/on the Ethernet connection ( or unplug/plug the cable).

I am using Linux Ubuntu 12.04 on both computers.

Any ideas why is this happening, and if / how can it be avoided?

Cristian Ciupitu
  • 6,396
  • 2
  • 42
  • 56
  • You may have configured your network such that some of the packets use one interface while some use the other. Before bringing down an interface check with `tcpdump` if there is any traffic on the interface. Bringing down an interface with traffic on it is expected to cause a brief interruption before this traffic gets routed the other way. – kasperd Jun 02 '14 at 08:31
  • Thanks for the reply. I have checked the traffic with wireshark, and the traffic happens only between the expected IPs, not on both IPs. When I open the TCP socket, I bind it to on IP. Claudiu – user3698377 Jun 02 '14 at 10:04
  • I was not suggesting the wrong IP was being used, I was suggesting the traffic may be going over a different interface from what you expect. – kasperd Jun 02 '14 at 10:17
  • My application does not use the interface that I am closing, but other things are going on, like SSDP. Nevertheless, several seconds to re-route this, and even worse when I reconnect. And in this period all the network interfaces are blocked... There must be something wrong... – user3698377 Jun 02 '14 at 19:26

1 Answers1

0

This is due to an interesting combination of Linux's "promiscuous ARP" behaviour, and ARP caching. Essentially, what is happening is that the wifi, despite not having the IP address on itself, is receiving ARP requests, and is sending ARP responses. If that is the ARP response the other machine on the local subnet (usually the router) receives first, that's what will go into the ARP cache. The pause in traffic is resolved when the ARP entry goes stale and the ARP request is repeated, at which point the other interface gives the only response, and everything proceeds normally.

To stop Linux doing this "ARP on all interfaces for any address configured", you need to set the sysctl net.ipv4.conf.<interface>.arp_ignore to 1.

womble
  • 96,255
  • 29
  • 175
  • 230