Last week we encounter the following issue : we had to shutdown our entire infrastructure due to UPS replacement. At the end of electrical operations we had restarted :
- network
- SANs
- vCenter
- ESXis (2 in cluster)
After waiting for ESXi's startup, we discover that the cluster had error : Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster.
We then discover that vCenter cannot contact ESXis through the network : a switch's PDU had been unplugged during operations.
PDU re-replugged, ESXis can now communicate with vCenter, but the following alarm has come on each host : vSphere HA agent cannot be correctly installed or configured.
We decided to restart both ESXis, no luck, errors still remain.
Due to maintenance window constraint we decided to remove both hosts from the cluster to be able to start our VMs, at the cost of no automatic fail-over in case of failure of one host.
After googling a lot, reading many VMware's KBs we try (no order) :
- disconnect/reconnect hosts
- re-assign host to cluster one by one with no VM
- restart vSphere High Availability service Reconfiguring HA (FDM)
- re-check network needs Search Network port diagram for vSphere 6.x
- disable/enable HA Troubleshooting VMware High Availability
- uninstall/reinstall FDM KB 2056299
No more result...
During our journey we discover only one error in /var/log/fdm.log
on both hosts :
2018-06-25T09:05:54.232Z error fdm[47A8940] [Originator@6876 sub=Cluster] [ClusterPersistence::DoFetchDataSync] Open of file /etc/opt/vmware/fdm/kvstore failed: No such file or directory
2018-06-25T09:05:54.232Z warning fdm[47A8940] [Originator@6876 sub=Cluster] [ClusterManagerImpl::ReadPersistentObject] Couldn't open kvstore
Googling this kvstore-thing lead me to nothing, maybe I have to review my google-fu...