I am encountering a very stubborn VM (2008R2, VMware tools just slightly outdated, the ones that came with 5.5U3a) on a ESXi 6.0U2 cluster running on Dell R630 servers. From the outside, the VM becomes unresponsive after some time - might be a day, might be a week - and it's no longer responding to pings, connection requests and so on (it runs an industrial application and some MSSQL). That behaviour could already be observed when the cluster ran 5.5U3a, though.
So, I try to restart the VM via the webclient or via the fat client. Nothing happens. Like, for hours. Next escalation step:
esxcli vm process kill -w <worldID> -t soft
No response, no change. Skip -t hard and directly go to
esxcli vm process kill -w <worldID> -t force
No response as well. The VM keeps chugging along being unresponsive and all, but the world simply refuses to be killed. There's no error message, either. Rebooting the host with the VM is the last resort.
How can I identify the root cause for this very strange behaviour?