cluster suite rhel

Question

we have set 2 node cluster with san box our config like that HS22 IBM blade center with T3400 SAN box with SAN Switch i have try with RHEL 5.2 RHEL 5.3 RHEL 5.4 cluster suite i can reboot using luci as well i can fence both server even i can relocate the services from 1st node to 2node

Issues is if ckcek on node 1 clustat and it show me all the service and cluster owner is node 1 if i stop services network at node 1 it will relocate all the service to node2 and node 1 goes poweroff. when i reboot the node 1 it will join the cluster that time node 2 is owner of all the services as well cluster and if i stop service notwork at node2 it dont relocate cluster to node 1 and on my /var/log i can see 52 failed to changed RG status have any one come across like this issues if yes then what is work around

Thank you so much people I got this working!!!

can same tell me why -ne has give to my question if you dont like pls let me — Rajat, Nov 02 '09 at 18:14
Please take some time to properly format your question. If you expect people to take time to help you, make their job as easy as possible. — MikeyB, Nov 02 '09 at 18:17
Odd how this last comment of yours makes actual sense yet your question makes none. Fancy telling us, in your inimitable style, how it's now working so as to benefit mankind in some way? — Chopper3, Oct 20 '10 at 16:03

score 1 · Answer 1 · answered Feb 08 '10 at 21:49

1

if a network service goes down, the cluster node goes into "unknown" state. The CS has no idea whether the host actually died, or became temporarily unresponsive. If you have a fence mechanism in there, you can fence the host, which will also inform the RHCS that the node is actually down, so the services can be taken to another node. If the services would simply restart elsewhere, and the host got it's network back, you would have the same service running on both nodes, accessing the same files on the SAN thus corrupting them.

answered Feb 08 '10 at 21:49

dyasny

18,802
6
49
64

2

Zombie thread wants brains... BRRAAAIIINNNSSS!! =) – Wesley Feb 08 '10 at 21:55
I _have_ to start checking out question dates :) – dyasny Feb 08 '10 at 22:27

score 1 · Accepted Answer · answered Nov 02 '09 at 18:29

1

I don't have any direct experience with RH clustering but, from your description, it sounds like node 1 isn't re-joining the cluster correctly after you reboot it.

As a starting point, I'd check that all the appropriate services are set to start automatically on node 1, but before I do that, I'd clean up your question, as it's almost unreadable in its current form.

There appears to be a bug (sort of) related to this over at RedHat's Bugzilla, too.

answered Nov 02 '09 at 18:29

RainyRat

3,730
1
24
29

thanks you so much for you time and note pls ignore my english skill. The bug which your pointing i thing is old one have tested on all the version so if it in 5.2 has that then from bug cman rpm which they are using i also use the same but still i have same pro again sorry for my english skills – Rajat Nov 03 '09 at 04:45

score 1 · Answer 3 · answered Nov 02 '09 at 20:58

1

I bet I'll receive some vote downs for this, but my experience with RHCS is that it basically doesn't work at all. I tried and tried and tried to make a simple 3 node cluster work with ricci and luci and ended up just giving up. My searches indicated similar experiences and a common theme that RHCS is not ready for deployments in production. I was able to sometimes join a couple servers to the cluster, but as soon as I tried to join another node, it just failed with very little information in the logs.

I ended up moving towards Pacemaker backed with a DRBD filesystem and found it is more flexible and just works. My advice is to use Pacemaker.

answered Nov 02 '09 at 20:58

Aaron Brown

1,697
1
12
22

Have you tried contacting RHT support about that? The usual culprit is the fence scripts, which might require some tweaking for your particular config. – dyasny Feb 08 '10 at 21:46
I had not even gotten to the point of dealing with fencing - we're talking about just joining 3 machines to the same cluster...a process that should require about 10 minutes of time. It fails almost every time. Regardless, I have found pacemaker to be a more robust solution (1. it works, 2. it is as or more flexible) so am not really worried about RHCS anymore. – Aaron Brown Feb 19 '10 at 22:39

cluster suite rhel

3 Answers3