Linux bridged network intermittent packet loss (KVM context)

Question

I have a standard bridging setup between the real world and a KVM VM guest.

Bridging looks fine:

[root@t ~]# brctl show
bridge name bridge id       STP enabled interfaces
br0     8000.40f2e9c6033d   no      eno2
                                    vnet0
virbr0  8000.000000000000   no

The default gateway is br0 on the host.

I can ping the VM from the host and the host from the VM.

If I ping anything outside either from the VM or the host itself, I see intermittent packet loss:

[root@locoxen2 ~]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=55 time=4.59 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=55 time=4.59 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=55 time=4.67 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=55 time=4.75 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=55 time=4.69 ms
64 bytes from 8.8.8.8: icmp_seq=15 ttl=55 time=1224 ms
64 bytes from 8.8.8.8: icmp_seq=16 ttl=55 time=224 ms
64 bytes from 8.8.8.8: icmp_seq=17 ttl=55 time=4.49 ms
64 bytes from 8.8.8.8: icmp_seq=18 ttl=55 time=4.48 ms
64 bytes from 8.8.8.8: icmp_seq=19 ttl=55 time=4.54 ms
64 bytes from 8.8.8.8: icmp_seq=20 ttl=55 time=4.52 ms
64 bytes from 8.8.8.8: icmp_seq=21 ttl=55 time=4.55 ms
64 bytes from 8.8.8.8: icmp_seq=22 ttl=55 time=4.70 ms
64 bytes from 8.8.8.8: icmp_seq=23 ttl=55 time=4.57 ms
64 bytes from 8.8.8.8: icmp_seq=24 ttl=55 time=4.88 ms
64 bytes from 8.8.8.8: icmp_seq=25 ttl=55 time=4.65 ms
64 bytes from 8.8.8.8: icmp_seq=26 ttl=55 time=4.53 ms
64 bytes from 8.8.8.8: icmp_seq=36 ttl=55 time=1430 ms
64 bytes from 8.8.8.8: icmp_seq=37 ttl=55 time=430 ms
64 bytes from 8.8.8.8: icmp_seq=38 ttl=55 time=4.57 ms
64 bytes from 8.8.8.8: icmp_seq=39 ttl=55 time=4.53 ms

This particularly happens if traffic is leaving the VM & host at the same time.

If I shutdown the VM (i.e. ensure no traffic from this side of the bridge), I see no packet loss if I ping from the host, as above.

Running tcpdump on the physical port (eno2) on the host whilst pinging from both host & VM at the same time shows me things like this (40:f2 is the host, 52:54 is the VM):

17:53:26.382679 40:f2:e9:c6:03:3d > e8:f7:24:49:49:ee, ethertype IPv4 (0x0800), length 98: 192.168.0.191 > 8.8.8.8: ICMP echo request, id 27485, seq 23, length 64
17:53:27.200397 52:54:00:16:f5:f4 > e8:f7:24:49:49:ee, ethertype IPv4 (0x0800), length 98: 192.168.0.221 > 8.8.8.8: ICMP echo request, id 11460, seq 2, length 64
17:53:27.382244 40:f2:e9:c6:03:3d > e8:f7:24:49:49:ee, ethertype IPv4 (0x0800), length 98: 192.168.0.191 > 8.8.8.8: ICMP echo request, id 27485, seq 24, length 64
17:53:28.200304 52:54:00:16:f5:f4 > e8:f7:24:49:49:ee, ethertype IPv4 (0x0800), length 98: 192.168.0.221 > 8.8.8.8: ICMP echo request, id 11460, seq 3, length 64

i.e. packets go out, but replies are never received.

The same, but with no traffic from the VM, or the VM is shutdown:

17:53:05.346226 40:f2:e9:c6:03:3d > e8:f7:24:49:49:ee, ethertype IPv4 (0x0800), length 98: 192.168.0.191 > 8.8.8.8: ICMP echo request, id 27485, seq 2, length 64
17:53:05.350936 e8:f7:24:49:49:ee > 40:f2:e9:c6:03:3d, ethertype IPv4 (0x0800), length 98: 8.8.8.8 > 192.168.0.191: ICMP echo reply, id 27485, seq 2, length 64
17:53:06.348159 40:f2:e9:c6:03:3d > e8:f7:24:49:49:ee, ethertype IPv4 (0x0800), length 98: 192.168.0.191 > 8.8.8.8: ICMP echo request, id 27485, seq 3, length 64
17:53:06.352855 e8:f7:24:49:49:ee > 40:f2:e9:c6:03:3d, ethertype IPv4 (0x0800), length 98: 8.8.8.8 > 192.168.0.191: ICMP echo reply, id 27485, seq 3, length 64

iptables and ebtables show no rules - everything set to ACCEPT. I have switched off all offload functions on the ports. No bonding is being used. The MAC addresses are unique - no overlaps that I can perceive.

Note that I see this for both CentOS6 and 7 host installs.

What am I overlooking?

Something is interrupting your network connection every 10 seconds, for 10 seconds. Check all your logs. — Michael Hampton, Dec 17 '17 at 19:30
Good idea. However, the network is staying up. Nothing in the logs to suggest otherwise. Also, it doesn't explain why I only see this happening when multiple MAC addresses are exiting the host (i.e. with the VM up). I'm wondering if it's a downstream issue with the switch it's connected to. — nroam, Dec 17 '17 at 20:42
some half baked bpdu in place? Swicth the VM to NAT mode so the bridge is no longer in use on the host and all egress packets originate on the host's MAC and check with the VM up and also pinging — dyasny, Dec 18 '17 at 02:35

score 3 · Answer 1 · answered Aug 30 '19 at 02:30

I've run into the same issue but the underlying problem was different. I ran across this question while trying to diagnose the problem and figured I'd leave an answer with another possible culprit, particularly when running KVM/qemu from the command line, for anyone else who might benefit from this information.

If you don't explicitly set the MAC address for a virtual network interface, qemu will assign it a default one. If you are running multiple virtual machines and haven't set the MAC addresses on their network interfaces, qemu will almost certainly give them both the same MAC address. (Note that most GUI KVM wrappers such as virt-manager will assign a random MAC address for you.)

Obviously, this wreaks havoc on layer 2 routing between the VMs and they'll fight over who owns the address.

Giving each host a different MAC address resolved the issue for me.

This - I'm so glad i came across this answer. Hours of scratching my head trying to figure out why my packet loss was like 30%, SSH connections dropping. I even tried switching cables, switch ports, plugged in a whole new quad-port NIC card into my r610. Only to realize that all the `vnet*`-interfaces had been assigned the same MAC address. Yet another indicator of a problem switching was running MTR and seeing multiple VMs' addresses appear, which was also strange. — bjd2385, Jun 01 '23 at 12:45

score 1 · Answer 2 · answered Dec 18 '17 at 14:07

In the end, there was nothing wrong with my setup.

The issue was that the upstream network switch was not set to use a static VLAN on the switchport concerned. The multiple MAC addresses it was seeing then caused confusion as to what VLANs should be assigned to the port.

How annoying!

Linux bridged network intermittent packet loss (KVM context)

2 Answers2