I am trying to set up a test cluster on a Xen Server with 2 paravirtualized CentOS 5.4 machines. I am using Pacemaker+Corosync, and following the instructions found at http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf and other sites. Anyway, when I try to manually stop the corosync service, about 80% of the times the whole VM locks up with the message "Waiting for corosync services to unload" and I am forced to shut the machine down manually. For the remaining 20%, the VM keeps responding and adds dots to the above message, but it won't actually stop the service. There aren't many resources on the internet about this particular error. Any ideas about this? Thanks in advance.
Asked
Active
Viewed 2,768 times
1 Answers
1
Might this be a STONITH-related action? Does the behaviour differ if you kill -9 <corosync-pid>
?

weeheavy
- 4,089
- 1
- 28
- 41
-
STONITH is (temporarily) disabled, according to the instructions I followed. I'll try your suggestion and report here, but the problem is that, most of the times, the VM locks completely and I can't access it in any way, so I'm forced to restart... – Antipop May 03 '10 at 14:27
-
1I've been able to test your suggestion; while rebooting one of the VMs, corosync stopped responding and I managed to kill the process with the -9 switch. As soon as I did this, the machine completed the reboot and came up normally. Still no clue about this, though... – Antipop May 04 '10 at 12:53
-
I think this has to do with one node (rather Corosync) going down, so heartbeat between nodes is cut. I believe you need to "force-kill" a node, so the other really notices the loss. Not sure if this will help you though. – weeheavy May 04 '10 at 13:29