4

I've been on hold now for an hour waiting for VMware support and am betting serverfault can beat them to the answer!

I am running ESX 4.0 and 4.1 on 6 HP blades, using FibreChannel LUN storage. We did some FC network maintenance over the weekend and took down 2 of the 4 paths the ESX hosts have to the storage array (EMC Clariion). When this happened, all 6 ESX hosts shut down all of their VMs.

I saw the messages like this in events:

Path redundancy to storage device naa.600.... degraded. Path vmhba0:.... down. 2 remaining active paths Affected datastores: ....

this was expected. then 3 minutes later:

Guest OS shutdown for vm1 (this was by the vpxuser)

vm1 is powered off (user "User")

why would it do this if there were still good paths? I don't see any setting like this anywhere. thanks!

carillonator
  • 815
  • 3
  • 12
  • 22
  • No idea but I use the same code on the same blades using FC (emulex in my case) but I use HP/Hitachi/3Par storage and haven't seen anything like that before. Can't help thinking it might be a storage problem. – Chopper3 Jun 20 '11 at 20:15
  • 1
    This looks very much like HA isolation response. Though that's not supposed to react to storage changes. However, if your hosts didn't handle path loss gracefully and froze for any extended period of time (15 sec or more), they could get the impression that they couldn't communicate to each other and trigger isolation response. – Max Alginin Jun 20 '11 at 20:29
  • 2
    @ynguldyn you're absolutely right -- the storage path issue was a red herring. We made an ethernet networking change at exactly the same time, the ESX hosts lost the ability to ping each other, so HA triggered shutdowns of all VMs. (you should post your answer so I can mark it correct!) – carillonator Jun 20 '11 at 23:02

1 Answers1

3

As we figured out in the comments, this seemed to be and actually was HA isolation response.

To provide a bit more value to the answer: to avoid such mishaps, I recommend setting up another network path for HA by configuring a service console (ESX)/management port (ESXi) that would utilize a path completely separate from your main network stack (vSwitch, pNICs, physical switch, UPS, power circuit).

Max Alginin
  • 3,284
  • 15
  • 11