Diagnosing a Linux routing issue: multiple external IPs and several internal subnets with multiple routing tables

Question

This is my first post on stack exchange. First, I'd like to thank this community for over my computer engineering journey I have learned many things here. :)

With this post I am mainly looking for direction of how to proceed in diagnosing and fixing my routing issue. This is by far the most complicated server setup I have attempted and I feel the issue rests somewhere in how Linux handles network interfaces and traffic. I will be continuing my research into the intricacies of Linux networking but hopefully someone here can help in directing my search.

My issue is this: I have five internet routable IPs (all with different MAC addresses) assigned to an external interface with an internal interface presiding over several internal VLAN subnets. Tying all these together are several routing tables with rules set for specific internal server IPs to route them to the different external IPs and visa-versa. Most traffic flows properly except for some seemingly random outliers. An example of this is workstations on the internal physical network cannot access duckduckgo.com and seemingly other random websites while others load fine.

What really baffles me is that I have a bridged openvpn setup which bridges a class B /21 subnet across two locations. Both locations are setup with separate DHCP server instances which direct traffic bound for the internet directly out their corresponding connection. When I'm at the second [client] location I can point a workstation's default route to the primary [server] location (effectively traversing the layer 2 tunned vpn link from the secondary location to exit on to the internet at the primary location) and I can hit all these sites such as duckduckgo.com without any issue. This doesn't seem to be a firewall or routing issue but instead a nuance I am unaware of with how Linux handles traffic.

Some Notes:

My ISP is a smaller local ISP proving fiber in town. Their network IP addressing configuration is peculiar to me when compared to how Charter (the local cable company) handles IP addressing. The local fiber ISP has several /24 internet routable IP subnets that they seem to have all residing on the same Data Link Layer. For example in my configuration I have one DHCP assigned IP which usually resides on one /24 subnet while my four static IPs live on another. On my end I put all IPs on different MAC addresses but they all communicate through the same Physical and Data Link Layer. Further more, their main router uses the same MAC address for all subnets as shown by the following arp output (where br14 is the DHCP and mac0-3 are macvlan devices):

XXX-XXX-97-1.static.ISP  ether   cc:4e:24:9f:47:00   C                     br14
XXX-XXX-32-1.static.ISP  ether   cc:4e:24:9f:47:00   C                     mac0
XXX-XXX-32-1.static.ISP  ether   cc:4e:24:9f:47:00   C                     mac1
XXX-XXX-32-1.static.ISP  ether   cc:4e:24:9f:47:00   C                     mac2
XXX-XXX-32-1.static.ISP  ether   cc:4e:24:9f:47:00   C                     mac3

Though not what I would've done, from my research I can't find any evidence that this is overly improper. I will say though that when I set net.ipv4.conf.all.rp_filter=1 with logging of martians my syslog blows up with traffic not meant for me in anyway.

Feb 10 17:49:44 srv3 kernel: [   12.770624] IPv4: martian source XXX.XXX.33.147 from XXX.XXX.33.1, on dev mac0

My sysctl.conf is currently as follows:

net.ipv4.conf.all.rp_filter=2

net.ipv4.ip_forward=1

net.ipv4.conf.all.arp_filter=1
net.ipv4.conf.all.arp_announce=2
net.ipv4.conf.all.arp_ignore=1

net.ipv4.conf.all.log_martians=1

Before I go into detail of how my setup gets configured I'd just like to reiterate where I am in a simple paragraph. I have an external physical interface configured with five IPs with policy based routing directing the traffic to the internal subnets. For the most part this works except for seemingly random endpoints on the internet. An even weirder symtom is that when I am remote and traverse a layer 2 tunneled VPN connection (bridged to the same bridge device the local physical network workstations use) and exit to the internet the same way I don't have these issues. I am not using the client-to-client directive in openvpn so my understanding is that all traffic should be processed through the kernel in the same mannor.

I am at my wits end and limit of my understanding at this point. From what I can tell so far is that I am missing something happening going on inside Linux. If anyone knows of a simple fix I'd be so happy. However, I'm fairly certain this is something that will be specific to the 'parameters' I am working within and the paradigm I am trying to create. I feel this should work and I am so close, I'm seeking advice on how to dive deep into diagnosing what is going on under the covers. Even if (though I'm not expecting this) it is something to do with my ISP I just need to know how to make a case with relevant data to point to. I have a good relationship with the owner of the ISP so if it comes down to that I am all ears. Again, how do I start to do a real deep dive into this?

---Here's the long, and probably not needed explaination of how the server gets configured upon boot. Just to include as much data as I can. Thank you for your interest and help!---

Here's the setup, it's an Ubuntu 20.04 host operating system with two physical NICs. eno1 is the external (internet) interface which resides on the motherboard. The second interface is a 8021q vlan capable card which acts as the internal interface.

The network configuration can be broken down into three main components: 1.) The external interface configuration which has five internet routable IPs. 2.) The internal interface configuration which consists of several class B and C subnets on separate VLANs. Most of these subnets reside on bridge interfaces. 3.) Tying all these together are multiple routing tables using iproute2 with the firewall being handled by nftables.

Most of the interfaces are brought up with netplan. Things to note: the five external IPs reside all on separate virtual interfaces. First is the main bridge interface [br14] which is assigned an IP via DHCP while the remaining four are macvlan interfaces configured mainly through a script executed by networkd-dispatcher while their static IPs are assigned in netplan.

Netplan Conf:

network:
  ethernets:
    eno1:
      match:
        macaddress: B8:XX:XX:XX:XX:A2
    eno2:
      match:
        macaddress: 00:XX:XX:XX:XX:71
      set-name: eno2
    mac0:
      addresses: [XXX.XXX.XXX.66/24]
    mac1:
      addresses: [XXX.XXX.XXX.67/24]
    mac2:
      addresses: [XXX.XXX.XXX.135/24]
    mac3:
      addresses: [XXX.XXX.XXX.136/24]
  vlans:
    eno2-vlan1:
      id: 1
      link: eno2
    eno2-vlan2:
      id: 2
      link: eno2
    eno2-vlan3:
      id: 3
      link: eno2
    eno2-vlan4:
      id: 4
      link: eno2
    eno2-vlan5:
      id: 5
      link: eno2
    eno2-vlan6:
      id: 6
      link: eno2
    eno2-vlan7:
      id: 7
      link: eno2
  bridges:
    br14:
      interfaces: [eno1]
      macaddress: B8:XX:XX:XX:XX:A2
      dhcp4: true
      dhcp4-overrides:
        use-dns: false
      nameservers:
        addresses: [XXX.XXX.XXX.5, 8.8.8.8]
    br0:
      addresses: [192.168.0.6/29]
    br1:
      interfaces: [eno2-vlan1]
      addresses: [172.23.3.30/21]
    br2:
      interfaces: [eno2-vlan2]
      addresses: [172.20.3.30/21]
    br3:
      interfaces: [eno2-vlan3]
      addresses: [172.22.3.30/21]
    br4:
      interfaces: [eno2-vlan4]
      addresses: [192.168.3.6/29]
    br5:
      interfaces: [eno2-vlan5]
      addresses: [172.21.3.30/21]
    br6:
      interfaces: [eno2-vlan6]
      addresses: [192.168.3.14/29]
    br7:
      interfaces: [eno2-vlan7]
      addresses: [192.168.1.93/27]
  version: 2

In /etc/networkd-dispatcher/routable.d is a script which looks like this:

#!/bin/bash

if [ ! -d "/run/rt_tables" ]; then
    mkdir -m 750 /run/rt_tables
    touch /run/rt_tables/availDevs
    chmod 640 /run/rt_tables/availDevs
fi

echo "$IFACE" >> /run/rt_tables/availDevs

case "$IFACE" in
    'br14')
        ip link add mac0 link br14 address B8:XX:XX:XX:XX:A3 type macvlan
        ip link add mac1 link br14 address B8:XX:XX:XX:XX:A4 type macvlan
        ip link add mac2 link br14 address B8:XX:XX:XX:XX:A5 type macvlan
        ip link add mac3 link br14 address B8:XX:XX:XX:XX:A6 type macvlan
    ;;
    *)
        /usr/local/sbin/checkAvailableDevs.sh
    ;;
esac

So far this essentially covers components 1 and 2 of how I broke down the network configuration earlier. The third component is mainly setup via the "checkAvailableDevs.sh" script as seen called in the default case of the switch statement.

Here is the checkAvailableDevs.sh script with a brief explanation following:

#!/bin/bash
tableSetupScript="`dirname $0`/rtableSetup.sh"
rt_runDir="/run/rt_tables"
rt_tablesConfDir="\/usr\/lib\/rt_tables"

function echoTableVarDir() {
    echo $(echo ${rt_tablesConfDir//\\/})
}

tableNames=($(find `echoTableVarDir` -type f | sed -r \
        -e "/`echo $rt_tablesConfDir`\/t_/"'!d' \
        -e "s/(`echo $rt_tablesConfDir`\/t_)(.+)/\2/"))

for tableName in ${tableNames[*]}
do
    source `echoTableVarDir`/t_$tableName

    declare -a availDevs
    for reqDev in ${reqDevs[*]}
    do
        availDevs+=(`
            while read -r availDev; do
                if [ "$reqDev" == "$availDev" ]; then
                    echo "$availDev"
                fi
            done < $rt_runDir/availDevs
        `)
    done

    if [ ${#reqDevs[@]} -eq ${#availDevs[@]} ]; then
        if [ ! -f "$rt_runDir/`echo $tableName`_configured" ]; then
            $tableSetupScript `echoTableVarDir`/t_$tableName
            touch $rt_runDir/"$tableName"_configured
            chmod 640 $rt_runDir/"$tableName"_configured
        fi
    fi

    unset availDevs
done

Disclaimer: I don't claim to be the best bash script writer. I'm primarily a Java engineer and this is again, the most complicated system configuration I've built. Please forgive possible inefficiencies.

Moving on, an explanation of checkAvailableDevs.sh:

First, as seen in the networkd-dispatcher script, with every interface that triggers the networkd-dispatcher script the interface name is appended to the file /run/rt_tables/availDevs. Later, when the checkAvailableDevs.sh script is executed it finds all files located in /usr/lib/rt_tables which represent configurations for routing tables to be configured. These files are named t_{tableName}. For example as a file name t_mail1 where "mail1" represents the routing table name.

These routing table configuration files look like this:

#!/bin/sh
tableName="mail1"

reqDevs=(mac2 br0 br1 br2 br7)

fromIP="172.XXX.XXX.XXX"
initDefaultGW=1
gwDev="mac2"
gwIP="XXX.XXX.XXX.1"
devRoutes=(br0 br1 br2 br7)

What happens is checkAvailableDevs.sh checks the device names in the "reqDevs" array against the /run/rt_tables/availDevs list of interfaces deemed routable by networkd-dispatcher. If all devices are available a second script called rtableSetup.sh is executed using the t_{tableName} file for specific configuration parameters required for the routing table.

rtableSetup.sh:

#!/bin/bash

# (function args)
# $1 - dev
#
# [SET] devCDIR
get_dev_CDIR() {
    devCDIR=`ip address show $1 | sed -r \
            -e '/inet\s/ !d' \
            -e 's/(.+inet\s)(.+\..+\..+\..+)(\sbrd.+)/\2/'`
}

# (function args)
# $1 - devCDIR
#
# [SET] netCDIR
get_net_CDIR() {
    netCDIR=`ipcalc $1 | awk '$1 == "Network:" { print $2 }'`
}

# (function args)
# $1 - dev
#
# [SET] netCDIR
get_net_CDIR_from_dev() {
    get_dev_CDIR $1
    get_net_CDIR $devCDIR
}


tableVarFile=$1
if [ "$tableVarFile" ]; then
    source $tableVarFile

    if [ "$tableName" ]; then
        ip rule add from $fromIP table $tableName

        if [ "$removeFromDefaultTable" ] && [ $removeFromDefaultTable == 1 ]; then
            if [ "$fromDev" ]; then
                get_net_CDIR_from_dev $fromDev
                echo "attempting to remove $fromDev from default table"
                echo "ip route del $netCDIR dev $fromDev"
                ip route del $netCDIR dev $fromDev
            else
                echo "Error: \$fromDev not spcified in table var file: $tableVarFile"
                exit
            fi
        fi

        if [ $initDefaultGW == 1 ]; then
            if [ "$gwIP" ] && [ "$gwDev" ]; then
                get_net_CDIR_from_dev $gwDev
                ip route add default via $gwIP dev $gwDev table $tableName
                ip route add $netCDIR dev $gwDev table $tableName
            else
                echo "Error: \$gwIP or \$gwDev not defined in table var file: $tableVarFile"
                exit
            fi
        fi

        if [ ${#devRoutes[@]} -gt 0 ]; then
            for devRoute in ${devRoutes[*]}
            do
                get_net_CDIR_from_dev $devRoute
                ip route add $netCDIR dev $devRoute table $tableName
            done
        else
            echo "Warning: No DevRoutes configured in table var file: $tableVarFile"
        fi
    else
        echo "Error: No tableName defined"
    fi
fi

Those three files: checkAvailableDevs.sh, t_{tableName}, rtableSetup.sh are responsible for the third component of my networking setup. I can verify all three components of this network configuration work as I expect to setup the external interfaces, ineternal interfaces, and finally the routing tables. I pry don't need to include all these scripts but I want to be as through as possible.

score 0 · Answer 1 · answered Mar 15 '21 at 17:52

Solved the issue a bit back. I was setting the MTU on the VPNs to 1492, when the tap interface was added to a network bridge that dropped the overall bridge MTU down to 1492 whilst the external internet interface was still at 1500. This would cause fragmentation which, some sites didn't like.

But hey, I learned some tcpdump in this process! Looking back this was an overly long post. Could've been more concise. Hopefully someone on some search finds this rambling helpful.

Diagnosing a Linux routing issue: multiple external IPs and several internal subnets with multiple routing tables

1 Answers1