NIC bonding over bridged GRE-TAP: not getting replies... unless I run "tcpdump"?

Question

I'm trying to get Linux bonding working over a VPN (GRE-TAP). The funny thing is, that it only works when I have tcpdump running on both hosts, but more on that later...

There are two machines, called pxn1 and pxn2. They are connected together using a simple switch via eth1.

pxn1 has IP address 10.1.1.197
pxn2 has IP address 10.1.1.199

IPsec

To get a secure connection, all IP traffic is encrypted using IPsec. This works, I can ping between the two machines without any problem and tcpdump shows only encrypted packets.

GRE-TAP

A GRE-TAP (tunnels Ethernet frames over IP) interface is then set up in both directions, because I will need a virtual network interface later on:

ip link add vpn_gre_pxn2 type gretap local 10.1.1.197 remote 10.1.1.199 dev eth1

ifconfig shows:

vpn_gre_pxn2 Link encap:Ethernet  HWaddr 1a:73:32:7f:36:5f
          inet6 addr: fe80::1873:32ff:fe7f:365f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1462  Metric:1
          RX packets:19 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1294 (1.2 KiB)  TX bytes:1916 (1.8 KiB)

This is on pxn1. On the other host the same interface is set up in the other direction.

Bridge

A bridge is set up that currently uses only the GRE-TAP device.

I need the bridge because later on I want to add more machines (my plan is to bridge all GRE tunnels together). The end result should become a VPN mesh network (with a dedicated GRE-TAP interface for each host-host combination). But since for now I'm just doing a first test with two machines, the bridge is of course somewhat useless, but nonetheless important for the test itself.

brctl addbr vpn_br
brctl addif vpn_br vpn_gre_pxn2

The bridge works because when I activate the vpn_br interface and set up some IP addresses (just for testing the bridge), ICMP PINGs work perfectly.

vpn_br    Link encap:Ethernet  HWaddr 02:00:0a:01:01:c5
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1462  Metric:1
          RX packets:11 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:448 (448.0 B)  TX bytes:468 (468.0 B)

Bonding

A Linux Bonding interface is now set up. Again, since this is just a first proof of concept test, I'll only add a single slave to the bond.

Later on there will also be a real separate Gbit NIC with a dedicated switch that will act as the primary slave (with the VPN being just a backup), but for now the bonding interface will use the VPN only.

modprobe bonding mode=1 miimon=1000
ifconfig bond0 hw ether 02:00:0a:01:01:c5  # some dummy MAC
ifconfig bond0 up
ifconfig bond0 mtu 1462
ifenslave bond0 vpn_br   # as said, only a single slive at the moment
ifconfig bond0 172.16.1.2/24 up

The other host is set up as 172.16.1.1/24 with HWaddr 02:00:0a:01:01:c7.

This results in a theoretically working bonding interface:

bond0     Link encap:Ethernet  HWaddr 02:00:0a:01:01:c5
          inet addr:172.16.1.2  Bcast:172.16.1.255  Mask:255.255.255.0
          inet6 addr: fe80::aff:fe01:1c5/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1462  Metric:1
          RX packets:11 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:448 (448.0 B)  TX bytes:468 (468.0 B)

The status also looks good to me:

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: vpn_br
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: vpn_br
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 1a:73:32:7f:36:5f
Slave queue ID: 0

...as does the routing table:

# ip route show
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.2
172.16.1.0/24 dev bond0  proto kernel  scope link  src 172.16.1.2
10.1.1.0/24 dev eth1  proto kernel  scope link  src 10.1.1.197
default via 10.1.1.11 dev eth1

NB: eht0 is a separate active NIC (Ethernet cross cable) but that should not matter IMHO.

The problem

The setup looks good to me, however, PING does not work (this was run on pxn1):

# ping 172.16.1.1
PING 172.16.1.1 (172.16.1.1) 56(84) bytes of data.
From 172.16.1.2 icmp_seq=2 Destination Host Unreachable
From 172.16.1.2 icmp_seq=3 Destination Host Unreachable
From 172.16.1.2 icmp_seq=4 Destination Host Unreachable

While pinging, tcpdump on the other machine (pxn2) says:

# tcpdump -n -i bond0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:45:13.013791 ARP, Request who-has 172.16.1.1 tell 172.16.1.2, length 28
17:45:13.013835 ARP, Reply 172.16.1.1 is-at 02:00:0a:01:01:c7, length 28
17:45:14.013858 ARP, Request who-has 172.16.1.1 tell 172.16.1.2, length 28
17:45:14.013875 ARP, Reply 172.16.1.1 is-at 02:00:0a:01:01:c7, length 28
17:45:15.013870 ARP, Request who-has 172.16.1.1 tell 172.16.1.2, length 28
17:45:15.013888 ARP, Reply 172.16.1.1 is-at 02:00:0a:01:01:c7, length 28

However, when I also run tcpdump on pxn1 in a separate terminal, I suddenly get my ICMP replies!

...
From 172.16.1.2 icmp_seq=19 Destination Host Unreachable
From 172.16.1.2 icmp_seq=20 Destination Host Unreachable
64 bytes from 172.16.1.1: icmp_req=32 ttl=64 time=0.965 ms
64 bytes from 172.16.1.1: icmp_req=33 ttl=64 time=0.731 ms
64 bytes from 172.16.1.1: icmp_req=34 ttl=64 time=1.00 ms
64 bytes from 172.16.1.1: icmp_req=35 ttl=64 time=0.776 ms
64 bytes from 172.16.1.1: icmp_req=36 ttl=64 time=1.00 ms

This only works as long as both machines have tcpdump running. I can start/stop tcpdump and consistenly only see replies while the program is running on both machines at the same time. It doesn't matter on which machine I try.

Is this a kernel bug or (more probable) has my configuration some problem?

Is it normal, that the bridge and bonding interface both show the same MAC address? I only configure it manually for the bonding interface, which apparently changes also the bridge..

FYI, config overview:

for pxn1: http://pastebin.com/2Vw1VAhz
for pxn2: http://pastebin.com/18RKCb9u

Update

I get a working setup when I set the bridge interface into promiscous mode (ifconfig vpn_br promisc). I'm not quite sure if that is normally needed. OTOH I don't think it has any downsides...

BTW, there a similar RedHat bug report exists, but setting bond0 down/up doesn't help in my case..

score 1 · Answer 1 · answered Jan 07 '15 at 01:13

1

Does it work without the bonding piece? I suspect the issue is that the LACP messages aren't getting across the bridge until you put it in promiscuous mode.

If you're using a 3.5 or higher kernel, it might also help to enable transmission of IGMP queries for the bridge interface. This could help the bridge subscribe to the LACP multicast group.

echo -n 1 > /sys/devices/virtual/net/vpn_br/bridge/multicast_querier

answered Jan 07 '15 at 01:13

Steven K

113
4

Yes, as described, the bridge itself (without bonding) works. Indeed, activating promiscous mode helps (see 'Update' section). [This is a 2.6.32 kernel](http://pastebin.com/2Vw1VAhz), which might be the reason there is no `/sys/devices/virtual/net/vpn_br/` directory (only `lo` and `venet0` in there). – Udo G Jan 07 '15 at 08:57