I have setup a Linux cluster with Corosync/Pacemaker, and the two cluster nodes are within the same subnet sharing a virtual IP. For machines within the same subnet, they can ping the virtual IP "135.121.192.104" successfully.
However, if I tried to ping the virtual IP "135.121.192.104" from the machine from a different subnet, then it does not respond to my ping. The other machines resides on the subnet "135.121.196.x".
On my machines, I have the following subnet mask in my ifcfg-eth0 file:
NETMASK=255.255.254.0
and below is my output for the crm configure show:
[root@h-008 crm]# crm configure show
node h-008 \
attributes standby="off"
node h-009 \
attributes standby="off"
primitive GAXClusterIP ocf:heartbeat:IPaddr2 \
params ip="135.121.192.104" cidr_netmask="23" \
op monitor interval="30s" clusterip_hash="sourceip"
clone GAXClusterIP2 GAXClusterIP \
meta globally-unique="true" clone-node-max="2"
property $id="cib-bootstrap-options" \
dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
and the output of the crm_mon status:
[root@h-009 crm]# crm_mon status --one-shot
non-option ARGV-elements: status
============
Last updated: Thu Jun 23 08:12:21 2011
Stack: openais
Current DC: h-008 - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ h-008 h-009 ]
Clone Set: GAXClusterIP2 (unique)
GAXClusterIP:0 (ocf::heartbeat:IPaddr2): Started h-008
GAXClusterIP:1 (ocf::heartbeat:IPaddr2): Started h-009
I am new to the Linux HA cluster setup, and unable to find out the root cause for the issue. Is there any configuration I can check to diagnose this problem?
Additional comments:
Below is the output of "route -n"
[root@h-008 crm]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
135.121.192.0 0.0.0.0 255.255.254.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
0.0.0.0 135.121.192.1 0.0.0.0 UG 0 0 0 eth0
and below is the traceroute output from the cluster machine to the machine outside the cluster:
[root@h-008 crm]# traceroute 135.121.196.122
traceroute to 135.121.196.122 (135.121.196.122), 30 hops max, 40 byte packets
1 135.121.192.1 (135.121.192.1) 6.750 ms 6.967 ms 7.634 ms
2 135.121.205.225 (135.121.205.225) 12.296 ms 14.385 ms 16.101 ms
3 s2h-003.hpe.test.com (135.121.196.122) 0.172 ms 0.170 ms 0.170 ms
and the below is the traceroute output from the machine outside the cluster, to the virtual IP 135.121.192.104:
[root@s2h-003 ~]# traceroute 135.121.192.104
traceroute to 135.121.192.104 (135.121.192.104), 30 hops max, 40 byte packets
1 135.121.196.1 (135.121.196.1) 10.558 ms 10.895 ms 11.556 ms
2 135.121.205.226 (135.121.205.226) 11.016 ms 12.797 ms 14.152 ms
3 * * *
4 * * *
5 * * *
6 * * *
7 * * *
8 *
but when I tried to do a traceroute to the cluster's real IP address for one of the nodes, the traceroute is successful, i.e.:
[root@s2h-003 ~]# traceroute 135.121.192.102
traceroute to 135.121.192.102 (135.121.192.102), 30 hops max, 40 byte packets
1 135.121.196.1 (135.121.196.1) 4.994 ms 5.315 ms 5.951 ms
2 135.121.205.226 (135.121.205.226) 3.816 ms 6.016 ms 7.158 ms
3 h-009.msite.pr.hpe.test.com (135.121.192.102) 0.236 ms 0.229 ms 0.216 ms