Instabilities with Bridged and bonded interfaces

Question

I did post yesterday to get a working setup with several bridged interfaces used for virtual machines (KVM/libvirt).

One of the bridged interface is just using eth3 as its ports while the second one (public traffic) is using an ethernet bonded interface.

That setup is working but not all the time ! I can start a download from a vm, then it will stop and freeze!

So I don't know if my bridge parameters are correct, could you check the below config ?

iface eth3 inet manual

auto bond0
iface bond0 inet manual
    slaves eth1 eth2
    pre-up ip link set bond0 up
    down ip link set bond0 down

auto br0
iface br0 inet static
    address 10.160.0.7
    netmask 255.255.255.128
    bridge_ports eth3
    bridge_fd 9
    bridge_hello 2
    bridge_maxage 12
    bridge_stp on

auto br0:1
iface br0:1 inet static
    address 10.160.0.9
    netmask 255.255.255.255

auto br0:2
iface br0:2 inet static
    address 10.160.0.10
    netmask 255.255.255.255

auto br1
iface br1 inet static
    address 217.4.40.242
    netmask 255.255.255.240
    gateway 217.4.40.241
    pre-up /etc/network/firewall start
    bridge_ports bond0
    bridge_fd 9
    bridge_hello 2
    bridge_maxage 12
    bridge_stp on

auto br1:1
iface br1:1 inet static
    address 217.4.40.252
    netmask 255.255.255.255

auto br1:2
iface br1:2 inet static
    address 217.4.40.253
    netmask 255.255.255.255

And yes, it also sometimes speaks about martian on the host:

kernel: [249146.055172] martian source 10.160.0.17 from 10.160.0.10, on dev vnet2
kernel: [249146.073122] ll header: ff:ff:ff:ff:ff:ff:54:52:00:76:c3:5c:08:06

score 2 · Answer 1 · edited Jun 11 '20 at 10:02

Sounds like a problem that I'm facing.
This is the example WORKING config of bridged-bonding configuration for KVM, but it utilizes only one interface same time. Maybe it depends on switch (used Planet GSD-802S and HP V1910). I'm using this at two locations (with different hardware and switches).

cat /etc/modprobe.d/bonding.conf
alias bond0 bonding options bonding mode=802.3ad miimon=100 downdelay=200 updelay=200 ad_select=0 lacp_rate=fast

cat /etc/network/interfaces
auto lo iface lo inet loopback

Enslave all the physical interfaces

Card #1 Nvidia Gigabit onboard

auto eth1
iface eth1 inet manual
bond-master bond0

Card #2 Intel PRO/1000 F Server Adapter - FIBER

auto eth2
iface eth2 inet manual
bond-master bond0

Bridge to LAN for virtual network KVM

auto br0
iface br0 inet static
address 10.0.0.254
netmask 255.255.255.0
network 10.0.0.0
broadcast 10.0.0.255
gateway 10.0.0.1
dns-nameservers 10.0.0.1 8.8.8.8
bridge-ports bond0
bridge-fd 9
bridge-hello 2
bridge-maxage 12
bridge-stp off

Card #3 - Internet modem

auto eth0
iface eth0 inet manual

Bridge for virtual network KVM - modem

iface br1 inet manual
bridge_ports eth0
bridge_stp off
bridge_fd 0
bridge_maxwait 0
metric 1
auto br1

score 1 · Answer 2 · answered Jan 12 '10 at 09:42

Apart from the fact that the pre-up/down attributes aren't required, and you should turn on some arp link monitoring, the config on that bond looks OK. However, you shouldn't set the netmask on alias interfaces; just let the kernel set the netmask correctly (it should be the same as the netmask on the main IP -- I think the /32 mask is what's causing the martian problems).

Without network dumps of the traffic around the time of the stall, it's hard to tell what might be the cause. A few ideas for tracking down the problem:

Make sure the network works with small packets (ping, etc)
Ensure that the problem is consistently reproducible (does it happen every time you try to download from the VM?)
Get rid of the bond, see if it's still reproducible (if it isn't, the bond is probably at fault)
Does the same download on the host machine cause the problem? (If it doesn't, then the problem isn't with the bond)
Try dropping the MTU on the VM's NIC; I can't see anything in your host config that would cause issues, but other network devices might have issues.

I removed pre-up and down attributes. My arp link monitoring options are in the modprobe.d/alias file. If I don't set the network mask, I got errors like : Don't seem to be have all the variables for br0:2/inet.
Failed to bring up br0:2. About the list of ideas : 1° When the network freeze, nothing works (ping, icmp echo request -> 64bytes)
2° I don't have yet enough experience on the vm to tell you exactly in wich case it will happen. 3° Getting rid of the bond : not an option, it works on the host. — , Jan 12 '10 at 10:25
Finally, it doesn't change anything if I got a /32 or the same subnet for the aliases ip. In all of those scenario I got something like 40 to 50% of packet loss on the host AND on the guest... even if the vm are shuted down. I aslo check with only the bonding part (so no bridging) and I got 0 % packet loss, so it is definitively my brdiges settings that aren't correct. — , Jan 12 '10 at 14:52

score 0 · Answer 3 · answered May 26 '12 at 02:37

Is there some reason why you have spanning tree enabled? Unless you're providing some kind of redundant connectivity between external segments it's not necessary and could potentially either block traffic or cause an upstream switch to temporarily shut the port down.

score 0 · Answer 4 · answered Jan 02 '13 at 14:42

not really and answer but rather 'me too' + workaround: i had similar problem with Broadcom's BCM5708 nics in dell poweredge 2950. in my case there was just bonding + vlans, no bridging. after few days of running host was losing internet access. i did not have much troubleshooting options and ended up adding additional ethernet card. now bond between on-board and add-on card works fine.