0

I ran into a problem that occurs only on a few servers. Every 5-6 minutes I lose connection to the server, and it get backs after a few minutes. In most cases it's a VM that is not reachable while other server on the same host are reachable.

I used tcpdump to capture the traffic, following are the important parts of the pcap:

11:49:03.964855 IP6 :: > ff02::1:ffe5:8fb0: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ffe5:8fb0, length 24 11:49:03.964961 IP6 :: > ff02::1:ffe5:8fb0: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ffe5:8fb0, length 24 11:49:03.966280 ARP, Request who-has 84...* tell 84..., length 28 11:49:03.966632 ARP, Reply 84... is-at 00:00:5e:00:01:03 (oui Unknown), length 46 11:49:03.966643 IP 84....50879 > google-public-dns-a.google.com.domain: 18212+ PTR? 0.b.f.8.5.e.f.f.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (90) 11:49:08.970373 IP 84....50879 > google-public-dns-a.google.com.domain: 18212+ PTR? 0.b.f.8.5.e.f.f.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (90) 11:53:18.356686 IP 84...* > *.cable.012.net.il: ICMP echo reply, id 61593, seq 23533, length 64 11:53:18.801857 IP6 :: > ff02::1:ffe5:8fb0: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ffe5:8fb0, length 24 11:53:18.801973 IP6 :: > ff02::1:ffe5:8fb0: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ffe5:8fb0, length 24

After the last packet there is a silent for a few minutes that the server is not reachable. It's important to mention that while the server is not reachable from the outside, the server can connect to remote server (by connecting directly to the console). I'm aware there is a problem with the IPV6, but I don't really understand why.

The setup contains a Juniper Router (don't have access to the router) -> L2 Switch -> Proxmox -> VM

The same happens for me with Esxi with IPV6,

VM - CentOS release 6.3 (Final) - 2.6.32-39-pve
Proxmox - pve-manager/3.4-6/102d4547 (running kernel: 2.6.32-39-pve)
L2 - Dlink - no special configuration after factory reset.
When the VM is not reachable from the outside, pinging inside the LAN causing the VM be reachable from the world again (only for a few minutes)

Inside the proxmox host itself:

brctl showmacs vmbr0: (when server is not reachable from the world)

  1 ee:75:67:e5:8f:b0   no         0.59

When the server is reachable from the world

  2 ee:75:67:e5:8f:b0   no       127.15

I tried to replace L2 switch didn't help. Installed a new server with a new VM. Same results.

After disabling IPv6 on the VM it stopped, but I don't really understand why. The setup of the server are totally basic.

danielch1
  • 1
  • 1

1 Answers1

1

When you did your TCPDUMP, was the IPv6 traffic abnormally high? We experienced a similar issue which was caused by malfunctioning l217-LM intel drivers on the network. In short, when machines with the effected drivers are put into sleep mode, they can enter a state where 2 or more of them will constantly spam IPv6 messages back and forth. The messages aren't enough to even show up as an abnormal amount of traffic on the switches, but some devices (including one of our multi-function-printers and our Sonicwall) couldn't handle the high amount of this particular type of IPv6 traffic, went to 100% CPU and were useless until the traffic stopped.

Here's a thread that describes it in a bit more detail: http://www.gossamer-threads.com/lists/cisco/nsp/177843

searching for "HBH ICMP6, multicast listener report" also brings up others with similar issues.

One of the possible solutions is to filter or throttle multicast traffic on the switch (look for storm control features on your switch) and in my case, fixing the aberrant Ethernet drivers.