I've got a bunch of Linux servers with multiple(3) NICs and associated network interfaces. I'm tripping over a bizarre routing problem, where traffic that should use the default route is not, and failing to get routed as a result. Here's what my routing table looks like:
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.31.96.1 0.0.0.0 UG 0 0 0 em3
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 em1
10.31.96.0 0.0.0.0 255.255.252.0 U 0 0 0 em3
10.31.96.0 0.0.0.0 255.255.252.0 U 0 0 0 em4
# ip route list
default via 10.31.96.1 dev em3 proto static
10.0.0.0/8 dev em1 proto kernel scope link src 10.0.0.100
10.31.96.0/22 dev em3 proto kernel scope link src 10.31.97.100
10.31.96.0/22 dev em4 proto kernel scope link src 10.31.96.61
10.31.96.1 is my default route that all traffic should be using (that em# stuff is a Fedora thing, you can safely mentally substitute 'eth' everywhere that you see 'em' if it makes it easier to follow). Here's ifconfig output:
em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.100 netmask 255.0.0.0 broadcast 10.255.255.255
inet6 fe80::b6b5:2fff:fe5b:9e7c prefixlen 64 scopeid 0x20<link>
ether b4:b5:2f:5b:9e:7c txqueuelen 1000 (Ethernet)
RX packets 283922868 bytes 44297545348 (41.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 538064680 bytes 108980632740 (101.4 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xfeb60000-feb80000
em3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.31.97.100 netmask 255.255.252.0 broadcast 10.31.99.255
inet6 fe80::b6b5:2fff:fe5b:9e7e prefixlen 64 scopeid 0x20<link>
ether b4:b5:2f:5b:9e:7e txqueuelen 1000 (Ethernet)
RX packets 3733210 bytes 1042607750 (994.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1401537 bytes 114335537 (109.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xfea60000-fea80000
em4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.31.96.61 netmask 255.255.252.0 broadcast 10.31.99.255
inet6 fe80::b6b5:2fff:fe5b:9e7f prefixlen 64 scopeid 0x20<link>
ether b4:b5:2f:5b:9e:7f txqueuelen 1000 (Ethernet)
RX packets 2416588 bytes 196633917 (187.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 205038 bytes 19363499 (18.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xfeae0000-feb00000
em1/10.0.0.100 goes to a switch that is attached only to servers in the same rack. Its used only for the servers in that rack to communicate amongst themselves. em3 & em4 both route to the same subnet. The only difference between them is that em3 is not always up (its associated with a floating IP address based on which server is currently in the 'master' role). Basically all traffic should be going out through em3 unless its destined for something else on the local 10.0.0.1/8 subnet, in which case it should go out over em1. However, that's not what is happening. 10.31.96.1/16, 10.31.97.1/16, and 10.31.99.1/16 traffic is going through em3, but stuff destined for 10.31.45.1/16 is trying to go through em1, and timing out because there's no way to route that traffic effectively.
This is also illustrated with the following command: # tcptraceroute cuda-linux traceroute to cuda-linux (10.31.45.106), 30 hops max, 60 byte packets 1 cuda-fs1a-internal (10.0.0.100) 3006.650 ms !H 3006.624 ms !H 3006.619 ms !H
Yet when run from a system on the same network as the box above, with only a single network interface, it works: # tcptraceroute cuda-linux traceroute to cuda-linux (10.31.45.106), 30 hops max, 40 byte packets 1 10.31.96.2 (10.31.96.2) 0.345 ms 0.403 ms 0.474 ms 2 cuda-linux (10.31.45.106) 0.209 ms 0.208 ms 0.201 ms
I thought that I could fix this by adding a route to 10.31.45.1 for em3, but that fails:
# route add default gw 10.31.45.1 em3
SIOCADDRT: Network is unreachable
I'm lost at this point on what else to try. help?