VMware High Availability questions

Question

I can't seem to find the answers to these questions within the vSphere 5 Documentation center, so please share if you are aware of any aspects of these questions.

What happens to the failed VM that has been configured for High Availability (HA)? for Fault Tolerance (FT)? Is it automatically stopped and are their resources released (but restarted (HA) or secondary instance takes over (FT))?
What are the typical methods to debug what has happened to a failed VM? Through logs or a VMware VM monitoring UI? Would the user be notified of the VM failure?
Is it possible to clone a VM before it is restarted or failed over to later debug what happened to it?

Thanks!!

Rex · Accepted Answer · 2015-03-04T23:13:23.003

vSphere behavior for each technology is defined in the documentation. You do, however, seem to have an incorrect view of what these technologies are designed for. Both VMWare HA and VMWare FT are designed primarily to provide availability of the guests in the event of host failures.

VMWare HA

If a master host is unable to communicate directly with the agent on a slave host, the slave host does not respond to ICMP pings, and the agent is not issuing heartbeats it is considered to have failed. The host's virtual machines are restarted on alternate hosts. If such a slave host is exchanging heartbeats with a datastore, the master host assumes that it is in a network partition or network isolated and so continues to monitor the host and its virtual machines

VMWare FT

A transparent failover occurs if the host running the Primary VM fails, in which case the Secondary VM is immediately activated to replace the Primary VM. A new Secondary VM is started and Fault Tolerance redundancy is reestablished within a few seconds. If the host running the Secondary VM fails, it is also immediately replaced. In either case, users experience no interruption in service and no loss of data.

On your specific questions:

What happens to the failed VM that has been configured for High Availability (HA)? for Fault Tolerance (FT)? Is it automatically stopped and are their resources released (but restarted (HA) or secondary instance takes over (FT))?

VMWare HA and FT are geared for recovery of a host failure. You don't configure a VM for HA - you configure the hosts. In a host failure, HA would restart the guest on a different host. vCenter can also do limited guest heartbeat monitoring (with VMWare tools) that can also trigger a reset of the guest machine on the same host. In either case, it is not a clean shutdown and is treated as a crash consistent shutdown/restart cycle.

FT creates a duplicate guest running in lockstep with the source. In the event of an host failure, the secondary guest will automatically take over and vCenter will create a new secondary (if possible). Guest heartbeat monitoring is not done as any changes causing the primary to hang would be duplicated on the secondary. FT is strictly to provide access to guests in the event of host failures.

What are the typical methods to debug what has happened to a failed VM? Through logs or a VMware VM monitoring UI? Would the user be notified of the VM failure?

As is often the case, it depends. Typical troubleshooting/debugging involves seeing error messages. if you want to go beyond that on general troubleshooting techniques, it probably goes beyond the scope of this site. End users of the guest would not be notified. Admins for vCenter can be notified if you have alerting setup and configured properly in vCenter or if you are using other 3rd party monitoring tools.

Is it possible to clone a VM before it is restarted or failed over to later debug what happened to it?

As both HA and FT are geared to provide for HOST failures, this is not possible within the bounds of the built-in technologies.

You do, in essence, configure a VM for HA in the HA cluster settings by setting the `VM restart priority`. A VM with it's restart priority set to disabled will not be restarted on a new host in the event of the failure of the original host. — joeqwerty, Mar 04 '15 at 23:25
Thanks for your input! It certainly helps answering some of my questions. However, I'm still not 100% sold on the HA and TL being only geared towards recovery of host failures. If you look here (http://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.avail.doc/GUID-D8E38A73-F14F-45A0-90CE-6048EF227C38.html), it also lists ability to do monitoring of hosts, VMs, and applications. Maybe I'm still missing something? — O_O, Mar 04 '15 at 23:50
VMware VM monitoring only applies if the guest Vm fails to send heartbeat information within a set period of time. The heartbeat is the only consideration, so winlogin failures, app failure have no bearing. You still need a cluster for application HA — Jim B, Mar 05 '15 at 03:51
@joeqwerty true - i was more trying to say that HA designed primarily to guard against host failure - not to guard against guest failure but to ensure availability of the guests in the event of a host failure. — Rex, Mar 05 '15 at 15:16
As I mentioned, it can do limited monitoring of the guests if VMWare Tools are installed. It's pretty limited though - if you want the options (or lack of) that you can configure with this, it can be found here (http://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.avail.doc/GUID-D6BE0A44-1A0B-40AA-9A24-9670927CCFFD.html) — Rex, Mar 05 '15 at 15:19
@Rex: Agreed. I just wanted to clarify that the failover of a VM is dependent upon the `VM restart priority` setting. — joeqwerty, Mar 05 '15 at 15:50

VMware High Availability questions

1 Answers1