14

This vCenter server was just upgraded to 5.1 update 1. I'm going through hosts and bringing firmware up to date, then upgrading them from various versions of 5.0 to 5.1u1.

vCenter 5.1u1 seems to have an interesting new behavior: it's removing hosts from maintenance mode when they reconnect after being disconnected -- but very inconsistently, I've seen it maybe 4 or 5 times on ~25-30 host reboots. I've only seen it happen on 5.0 hosts that have not yet been upgraded to 5.1.

tasks

In the image, I placed the host in maint mode and rebooted it into the HP SPP DVD's automatic update mode. After its usual ~40 minute update process, the host came back online.. and 7 seconds before even logging that the host had reconnected, vCenter had sent the host a task to exit maintenance mode.

events

In my understanding, the only time vCenter should drop a host out of maintenance mode is when vCenter put it into maintenance mode itself (such as a VUM upgrade task).

Why would this vCenter be unilaterally exiting a host from user-initiated maintenance mode?

Edit, additional info:

I ran the firmware upgrades on 5 more hosts, all at the same time. Two of them exited maint mode after reconnecting, three did not. The common factor of those exiting maint mode seems to be how long they were offline; the two that took a few tries to boot to the virtual media are the two that got knocked out of maint mode.

  • esx31 (image above): 45 minutes unresponsive
  • esx19 (exited maint): 87 minutes unresponsive
  • esx24 (stayed in maint): 32 minutes unresponsive
  • esx29 (stayed in maint): 39 minutes unresponsive
  • esx32 (stayed in maint): 30 minutes unresponsive
  • esx34 (exited maint): 70 minutes unresponsive

Edit: The disconnect time idea seems to have been a red herring, as it's not happening consistently.

Additionally, in the vpxd.log the exit maint mode task initiation seems to always immediately follow this vim.EnvironmentBrowser.queryProvisioningPolicy SOAP call. Here's the lines, slightly trimmed for clarity:

15:27:49.535 [info 'vpxdvpxdVmomi'] [ClientAdapterBase::InvokeOnSoap] Invoke done (esx31, vim.EnvironmentBrowser.queryProvisioningPolicy)
15:27:49.560 [info 'commonvpxLro'] [VpxLRO] -- BEGIN task -- esx31 -- HostSystem.exitMaintenanceMode --

Note that on the nodes that don't get the exit task, the vim.EnvironmentBrowser.queryProvisioningPolicy event still occurs. I'm not seeing any other differences in events before or after this in the reconnect process, aside from the extra events caused by exiting maintenance mode.

Given the log's mention of provisioning policies, looking for autodeploy-related maintenance mode issues turns up complaints about similar behavior (though I'm not using autodeploy at all).

Shane Madden
  • 114,520
  • 13
  • 181
  • 251
  • You might wish to contact the VMware customer support line....or ask in one of the vmware groups. This could possibly be a bug in the programming. – mdpc Jun 06 '13 at 02:59
  • Also, which vCenter approach are you using? Appliance? Running on Windows? – ewwhite Jun 06 '13 at 12:26
  • @ewwhite Running on Windows. – Shane Madden Jun 06 '13 at 17:27
  • Hmm... [Possibly related to this](http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1012473)? -- I'd say it definitely *shouldn't do that*... – voretaq7 Jun 06 '13 at 19:19
  • What kind of hardware are you using for your hosts? Our UCS was causing a similiar problem in that when a host was rebooted, some of them like to reboot twice, where as other ones (same blade type same firmware, same esx updates) would only reboot once. When I talked to Cisco about it they said "it's a known issue" – MoSiAc Jun 15 '13 at 13:02
  • @MoSiAc HP hosts - not sure if they rebooted twice as I wasn't watching the console, though I don't think they should have. Do you see this happening whenever they reboot twice? – Shane Madden Jun 15 '13 at 19:44

1 Answers1

2

I've seen this happen with ESXi 4.1 hosts after a patch accidentally wacked the /tmp/scratch folder. You might want to check if that directory still exists on the hosts that exited maintenance mode automatically.

If they're missing, you'll want to mkdir to create it. Also, you'll want to check if persistent scratch is setup correctly on each host by following this VMware KB article:

VMware KB: Creating a persistent scratch location for ESXi 4.x and 5.x

zippy
  • 1,718
  • 3
  • 21
  • 36