0

I have a master/master configuration which has 2 public IPs (1.1.1.66, 1.1.1.70) which are shared between 2 nodes. The idea is that both nodes have one public IP each. And if one of the nodes goes down, the other one will adopt it's public IP and start serving from both IPs.

The configuration also has a cloned gateway (ocf:heartbeat:Route) resource which is supposed to start after the public IPs.

I have this problematic scenario:

  1. 1st node is alive.
  2. 2nd node is dead.
  3. 1st node has 2 public IPs attached to it.
  4. 2nd node comes back up.
  5. Cloned gateway (ocf:heartbeat:Route) tries to start on the 2nd node even though the public IP is still active on the 1st node. This leads to a failure.

Failed Actions: default_gw_start_0 on haproxy-02 'unknown error' (1): call=32, status=complete, exitreason='default_gw Failed to add network route: to 0.0.0.0/0 dev eth0 via 1.1.1.65', last-rc-change='Thu Dec 26 01:03:36 2019', queued=0ms, exec=38ms

How can I make the 2nd node wait for the public IP to return before attempting to start the clone Route resource?

My configuration:

pcs resource create ip_1 ocf:heartbeat:IPaddr2 ip=1.1.1.66 cidr_netmask=29 nic="eth0" op monitor interval=30s
pcs resource create ip_2 ocf:heartbeat:IPaddr2 ip=1.1.1.70 cidr_netmask=29 nic="eth0" op monitor interval=30s

pcs resource create default_gw ocf:heartbeat:Route destination="0.0.0.0/0" device="eth0" gateway="1.1.1.65" family=""
pcs resource clone default_gw globally-unique="true"

pcs constraint order ip_1 then default_gw-clone
pcs constraint order ip_2 then default_gw-clone

pcs constraint colocation add ip_1 ip_2 -1

** UPDATE: Temporary solution is to avoid cloning the Route resource altogether and create separate Route resources (I called them default_gw_1 and default_gw_2) for each host: **

pcs resource delete default_gw
pcs resource create default_gw_1 ocf:heartbeat:Route destination="0.0.0.0/0" device="eth0" gateway="1.1.1.65" family="ip4" --force
pcs resource create default_gw_2 ocf:heartbeat:Route destination="0.0.0.0/0" device="eth0" gateway="1.1.1.65" family="ip4" --force

# start ips first, then start routes
pcs constraint order ip_1 then default_gw_1
pcs constraint order ip_2 then default_gw_2

# otherwise routes try to start on nodes without the ip attached (and fail)
pcs constraint colocation add default_gw_1 with ip_1 INFINITY 
pcs constraint colocation add default_gw_2 with ip_2 INFINITY 

# lock routes to start on their specific nodes only
pcs constraint location add gateway_1 default_gw_1 haproxy-01 INFINITY resource-discovery=exclusive
pcs constraint location add gateway_2 default_gw_2 haproxy-02 INFINITY resource-discovery=exclusive

# this will fight with stickiness 
pcs constraint location ip_1 prefers haproxy-01=50
pcs constraint location ip_2 prefers haproxy-02=50 
pcs resource defaults resource-stickiness=100
Ola Ström
  • 177
  • 1
  • 1
  • 6
  • 1
    Why you want to have the default gw as cluster resource? – c4f4t0r Feb 06 '20 at 11:19
  • I have 2 servers with 2 public IPs (on eth0). They are also connected through an internal lan (eth1) with local ips. If one of the servers goes down, the other one takes over it's public IP. And I can't start a default GW without a public IP, so this is why I want it as a resource. – Arkadiy Bolotov Feb 06 '20 at 17:46

0 Answers0