4

We recently migrated our entire VMware cluster from ESX over to ESXi. For the most part, the transition was seamless, and I haven't missed having access to the SC. Until now.

We're trying to diagnose some odd unicast flooding behavior that's happening during vMotions, and we suspect that the cause may be related to a discrepancy between switchgear CAM table cache expiry and the ARP table expiry on each ESXi host. As such, I've been trying to figure out how to view and clear the ARP table in ESXi.

On ESX (with the full SC), this would have been a cinch - just ssh in and run an arp -a. Unfortunately the neutered shell within ESXi doesn't include the ARP command, and I have not been able to find a single piece of documentation on this within VMware's KB.

I do have a support request in with VMware on this (going on 30 hours without an answer), but figured I'd toss it over here first to see if anyone has ideas. Thanks!

EEAA
  • 109,363
  • 18
  • 175
  • 245
  • While you're waiting for VMware to get back to you, you can look at the CAM table on the switch and could also insert a packet sniffer between the ESXi hosts to check the destination MAC addresses in the traffic between the hosts and see if that confirm your suspicions, no? – joeqwerty Nov 04 '10 at 00:19
  • Yes, we've done packet captures. The problem seems to be that during vMotions, the source node will ARP for the vmkernel interface of the target. The target will reply, but from the physical MAC of the host instead of the VMware "virtual" MAC of the vmkernel interface. As such, the switch gets confused and starts flooding to all ports on that VLAN, which quickly causes buffer overruns and packet drops, thereby causing the vMotion to take a *long* time. – EEAA Nov 04 '10 at 01:49
  • Well, I just got off the phone with an escalation engineer at VMware, and he said that, amazingly, there's no way to clear the ARP table in ESXi. Ugh. – EEAA Nov 05 '10 at 20:55

4 Answers4

2

Without the service console you need to use vCLI. It works with ESX/ESXi hosts.

Right now, I can't find a documented way to clear the ARP tables via RemoteCLI. The best I can find is here: Top Five New vCLI commands in vSphere 4.1

list all active connections: esxcli network connection list

list all ARP table entries: esxcli network neighbor list

Hope this helps. Let us know what support says.

andyhky
  • 2,732
  • 2
  • 25
  • 26
  • Thanks for the vCLI pointers. I'll give those a try tomorrow morning when I get into work. – EEAA Nov 04 '10 at 02:26
  • I'll mark this one as accepted for the vCLI ARP viewing commands. Unfortunately, per VMware, there doesn't appear to be a way to clear or otherwise manipulate the ARP table. – EEAA Nov 05 '10 at 20:57
2

After discussing with VMware, I learned that there is no way to clear or otherwise manipulate the ARP table on ESXi 4.1. I feel strongly that being able to perform these actions can be critical for troubleshooting, and I sure hope that they add this functionality in future versions of the product.

EEAA
  • 109,363
  • 18
  • 175
  • 245
1

ESXi 4.1 has the Remote CLI you can use, or if that doesn't support what you need, there's always the unspported way. However, the best part is, because you're using the latest and greatest 4.1 you can actually officially enable SSH.

Mark Henderson
  • 68,823
  • 31
  • 180
  • 259
  • The ESXi "Console" provides some general purpose commands via Busybox but VMware haven't chosen to support all that many commands and I don't think arp is included - I don't have access to an ESXi box right now so can't check. – Helvick Nov 03 '10 at 23:57
  • just checked an esx box. ssh'ed into it, arp is there - /sbin/arp – jqa Nov 04 '10 at 01:17
  • @James, esx or esxi? esx came with a fully-fledged Linux installation, esxi is missing a lot of the frilly bits. – Mark Henderson Nov 04 '10 at 01:38
  • It's ESXi, and there's no `arp`, unfortunately. I've enabled SSH. I do have a feeling that I'll need to use vCLI for this, but there seems to be zero documentation. :( – EEAA Nov 04 '10 at 01:50
1

Make sure you have all your vkernel ports on separate subnets e.g. separate vmotion/management/iscsi. Failure to do this can cause lots of flooding during vmotion as the physical switch does not learn the MAC address for the vmotion port correctly. And continuously broadcasts to find it.

DataBitz
  • 11
  • 1