forcing network traffic to route via a specific, non-default interface

Question

I've got a bunch of Linux servers with multiple(3) NICs and associated network interfaces. I'm tripping over a bizarre routing problem, where traffic that should use the default route is not, and failing to get routed as a result. Here's what my routing table looks like:

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.31.96.1      0.0.0.0         UG    0      0        0 em3
10.0.0.0        0.0.0.0         255.0.0.0       U     0      0        0 em1
10.31.96.0      0.0.0.0         255.255.252.0   U     0      0        0 em3
10.31.96.0      0.0.0.0         255.255.252.0   U     0      0        0 em4
# ip route list
default via 10.31.96.1 dev em3  proto static 
10.0.0.0/8 dev em1  proto kernel  scope link  src 10.0.0.100 
10.31.96.0/22 dev em3  proto kernel  scope link  src 10.31.97.100 
10.31.96.0/22 dev em4  proto kernel  scope link  src 10.31.96.61

10.31.96.1 is my default route that all traffic should be using (that em# stuff is a Fedora thing, you can safely mentally substitute 'eth' everywhere that you see 'em' if it makes it easier to follow). Here's ifconfig output:

em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
    inet 10.0.0.100  netmask 255.0.0.0  broadcast 10.255.255.255
    inet6 fe80::b6b5:2fff:fe5b:9e7c  prefixlen 64  scopeid 0x20<link>
    ether b4:b5:2f:5b:9e:7c  txqueuelen 1000  (Ethernet)
    RX packets 283922868  bytes 44297545348 (41.2 GiB)
    RX errors 0  dropped 0  overruns 0  frame 0
    TX packets 538064680  bytes 108980632740 (101.4 GiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    device memory 0xfeb60000-feb80000

em3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
    inet 10.31.97.100  netmask 255.255.252.0  broadcast 10.31.99.255
    inet6 fe80::b6b5:2fff:fe5b:9e7e  prefixlen 64  scopeid 0x20<link>
    ether b4:b5:2f:5b:9e:7e  txqueuelen 1000  (Ethernet)
    RX packets 3733210  bytes 1042607750 (994.3 MiB)
    RX errors 0  dropped 0  overruns 0  frame 0
    TX packets 1401537  bytes 114335537 (109.0 MiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    device memory 0xfea60000-fea80000

em4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
    inet 10.31.96.61  netmask 255.255.252.0  broadcast 10.31.99.255
    inet6 fe80::b6b5:2fff:fe5b:9e7f  prefixlen 64  scopeid 0x20<link>
    ether b4:b5:2f:5b:9e:7f  txqueuelen 1000  (Ethernet)
    RX packets 2416588  bytes 196633917 (187.5 MiB)
    RX errors 0  dropped 0  overruns 0  frame 0
    TX packets 205038  bytes 19363499 (18.4 MiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    device memory 0xfeae0000-feb00000

em1/10.0.0.100 goes to a switch that is attached only to servers in the same rack. Its used only for the servers in that rack to communicate amongst themselves. em3 & em4 both route to the same subnet. The only difference between them is that em3 is not always up (its associated with a floating IP address based on which server is currently in the 'master' role). Basically all traffic should be going out through em3 unless its destined for something else on the local 10.0.0.1/8 subnet, in which case it should go out over em1. However, that's not what is happening. 10.31.96.1/16, 10.31.97.1/16, and 10.31.99.1/16 traffic is going through em3, but stuff destined for 10.31.45.1/16 is trying to go through em1, and timing out because there's no way to route that traffic effectively.

This is also illustrated with the following command: # tcptraceroute cuda-linux traceroute to cuda-linux (10.31.45.106), 30 hops max, 60 byte packets 1 cuda-fs1a-internal (10.0.0.100) 3006.650 ms !H 3006.624 ms !H 3006.619 ms !H

Yet when run from a system on the same network as the box above, with only a single network interface, it works: # tcptraceroute cuda-linux traceroute to cuda-linux (10.31.45.106), 30 hops max, 40 byte packets 1 10.31.96.2 (10.31.96.2) 0.345 ms 0.403 ms 0.474 ms 2 cuda-linux (10.31.45.106) 0.209 ms 0.208 ms 0.201 ms

I thought that I could fix this by adding a route to 10.31.45.1 for em3, but that fails:

# route add default gw 10.31.45.1 em3
SIOCADDRT: Network is unreachable

I'm lost at this point on what else to try. help?

score 14 · Accepted Answer · answered Nov 06 '12 at 00:52

Routes are processed from the most specific route to the least specific (aka default) route.

default via 10.31.96.1 dev em3  proto static 
10.0.0.0/8 dev em1  proto kernel  scope link  src 10.0.0.100 
10.31.96.0/22 dev em3  proto kernel  scope link  src 10.31.97.100 
10.31.96.0/22 dev em4  proto kernel  scope link  src 10.31.96.61

You said you want should be going out through em3 unless its destined for something else on the local 10.0.0.1/8 subnet. This is exactly what is happening. The IP address 10.31.45.1 is within 10.0.0.0/8 and so it is leaving via em1. The 10.0.0.0/8 route matches that address is more specific then the default route. The address doesn't match the the 10.31.96.0/22 route. Therefore the em1 route is selected.

Your real problem is that you have a subnetmask on that em1 interface that is far too large for what you probably need, and it conflicts with the other networks. Anything destined for a IP address in the 10.0.0.1-10.255.255.254 range will attempt to use em1 as if it was local, which the exception of addresses in the 10.31.96.0/22 which will leave via em3/em4.

Your solution is to either fix the em1 subnet/network so that it doesn't conflict with your other networks, or to add lots of routes.

Something like ip route add 10.31.45.0/24 via 10.31.96.1 might do what you want.

Thanks, reducing the netmask to 255.255.255.248 on em1 fixed the problem. — netllama, Nov 06 '12 at 18:16

forcing network traffic to route via a specific, non-default interface

1 Answers1