My RHEL 6.1 virtual machines running on ESXi hosts 5.1.0 get frozen when the backup procedure tries to get a quiesced snapshot. I have to reset the virtual machines from VMware console. The tool used to quiesce the virtual machine is SYMCquiesce 1.0.0 by Symantec. The most weird is it does not fail always -- the backup is launched every week... but it fails only every two weeks.
After reading the VMware document http://kb.vmware.com/kb/2038606 and the RedHat document https://access.redhat.com/solutions/484303 and as I was not allowed to upgrade to RHEL 6.4, I decided to downgrade the VMware Tools to ESX 5.0. According to this table, I uninstalled VMware tools 9.0.15 and installed back 8.6.11.26309 (build-1310128). But this old version did not solve the problem... the virtual machines get on frozen... every two weeks.
The /etc/vmware-tools/tools.conf
contains
[vmbackup]
enableSyncDriver = false
[logging]
log = true
vmtoolsd.level = warning
vmtoolsd.handler = vmx
vmss.level = warning
vmss.handler = vmx
The logging section is included according to this VMware document to get some log messages in the ESXi host.
According to Symantec README file, the script /usr/sbin/pre-freeze-script
contains
/opt/SYMCquiesce/bin/freeze
The vmware.log
file, located in the ESXi host datastore, contains the following lines when the snapshot is succesfully created:
2016-01-11T23:39:27.849Z| vmx| I120: Vix: [12153447 vigorCommands.c:481]: VigorSnapshotManager_Take: takeOptions=136, clientFlags=0, displayName=NBU_SNAPSHOT backupcentral 1452555565 2016-01-11T23:39:27.879Z| vmx| I120: SnapshotVMX_TakeSnapshot start: 'NBU_SNAPSHOT backupcentral 1452555565', deviceState=0, logging=0, quiesced=1, forceNative=0, tryNative=1, sibling=0 saveAllocMaps=0 cb=18CA48E0, cbData=19CADF40 2016-01-11T23:39:27.893Z| vmx| I120: DISKLIB-VMFS : "/vmfs/volumes/5314ce6f-2d501c33-de3c-a41f72000c75/myrhel61/solr-flat.vmdk" : open successful (65557) size = 188978561024, hd = 0. Type 3 2016-01-11T23:39:27.893Z| vmx| I120: DISKLIB-VMFS : "/vmfs/volumes/5314ce6f-2d501c33-de3c-a41f72000c75/myrhel61/solr-flat.vmdk" : closed. 2016-01-11T23:39:27.927Z| vcpu-2| I120: ToolsBackup: changing quiesce state: IDLE -> STARTED 2016-01-11T23:39:29.692Z| vcpu-2| I120: ToolsBackup: changing quiesce state: STARTED -> COMMITTED 2016-01-11T23:39:29.693Z| vcpu-0| I120: Destroying virtual dev for scsi0:0 vscsi=10159 2016-01-11T23:39:29.693Z| vcpu-0| I120: VMMon_VSCSIStopVports: No such target on adapter 2016-01-11T23:39:29.742Z| vcpu-0| I120: SnapshotVMXTakeSnapshotCB: Enter checkpoint status 1, mode 0.
But when it fails, it appears the VMware tools does not respond -- pay attention on the lines after IDLE -> STARTED
:
2016-01-18T23:03:40.412Z| vmx| I120: Vix: [12153447 vigorCommands.c:481]: VigorSnapshotManager_Take: takeOptions=136, clientFlags=0, displayName=NBU_SNAPSHOT backupcentral 1453158218 2016-01-18T23:03:40.432Z| vmx| I120: SnapshotVMX_TakeSnapshot start: 'NBU_SNAPSHOT backupcentral 1453158218', deviceState=0, logging=0, quiesced=1, forceNative=0, tryNative=1, sibling=0 saveAllocMaps=0 cb=18CA48E0, cbData=19B04E90 2016-01-18T23:03:40.519Z| vmx| I120: DISKLIB-VMFS : "/vmfs/volumes/5314ce6f-2d501c33-de3c-a41f72000c75/myrhel61/solr-flat.vmdk" : open successful (65557) size = 188978561024, hd = 0. Type 3 2016-01-18T23:03:40.519Z| vmx| I120: DISKLIB-VMFS : "/vmfs/volumes/5314ce6f-2d501c33-de3c-a41f72000c75/myrhel61/solr-flat.vmdk" : closed. 2016-01-18T23:03:40.522Z| vcpu-0| I120: ToolsBackup: changing quiesce state: IDLE -> STARTED 2016-01-18T23:03:58.715Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out. 2016-01-18T23:04:03.814Z| vmx| I120: Tools: Tools heartbeat timeout. 2016-01-18T23:04:13.716Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out. 2016-01-18T23:04:13.716Z| vmx| I120: GuestRpc: app toolbox's second ping timeout; assuming app is down 2016-01-18T23:04:13.717Z| vmx| I120: ToolsBackup: changing quiesce state: STARTED -> DONE 2016-01-18T23:04:13.717Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: done with snapshot 'NBU_SNAPSHOT backupcentral 1453158218': 0 2016-01-18T23:04:13.717Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (40). 2016-01-18T23:04:13.718Z| vmx| I120: GuestRpc: Reinitializing Channel 0(toolbox) 2016-01-18T23:04:13.718Z| vmx| I120: GuestMsg: Channel 0, Cannot unpost because the previous post is already completed 2016-01-18T23:04:13.718Z| vmx| I120: GuestRpc: Channel 0 reinitialized. 2016-01-18T23:04:13.718Z| vmx| I120: GuestRpc: Channel 0 reinitialized.
Does anyone have any idea to solve this problem?
Thanks in advance.
PD: some other RHEL5 virtual machine gets also frozen.
update 1
The log files of the SYMCquiesce software do not exist on the dates when a "freezing" backup session ocurred. On the days when the backup session worked successfully, the log files contain the following message:
Unable to allocate shared memory
Here you see a calendar of the last backup sessions:
date reboot log file ------------ ------ -------- Feb 22, 2016 no yes Feb 29, 2016 yes no Mar 7, 2016 no yes Mar 14, 2016 no no Mar 21, 2016 yes no Mar 28, 2016 no yes Apr 4, 2016 yes no
On March 14th everything went fine because no backup session was launched.
ultimate update
After migrating to VMware 5.5 version, this problem disappeared.