Pacemaker ClusterIP stops working after 15 minutes but is still running

Question

I am running Corosync and Pacemaker via a cman stack, in an active-active setup delivering web pages, and I've hit a brick wall. I'm using the IPaddr2 resource agent to have an IP that is used simoultaneously by both nodes, and after about 15 minutes, it stops working. It is still running according to PCS, the ClusterIP rule is still in iptables, but the IP becomes unreachable. If I restart iptables, the cluster ip works for another fifteen minutes, then stops again. Here is my CIB config:

Resources:
 Master: RepoDataClone
  Meta Attrs: master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
  Resource: RepoData (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=fstore
   Operations: start interval=0s timeout=240 (RepoData-start-timeout-240)
               promote interval=0s timeout=90 (RepoData-promote-timeout-90)
               demote interval=0s timeout=90 (RepoData-demote-timeout-90)
               stop interval=0s timeout=100 (RepoData-stop-timeout-100)
               monitor interval=30s (RepoData-monitor-interval-30s)
 Resource: UpdateService (class=lsb type=rp-doReplicate)
  Operations: monitor interval=30s (UpdateService-monitor-interval-30s)
 Clone: ClusterIP-clone
  Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true interleave=false
  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.140.14.1 cidr_netmask=24 clusterip_hash=sourceip nic=eth0 broadcast=10.140.14.255 arp_interval=500 arp_bg=yes
   Meta Attrs: resource-stickiness=0
   Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
               stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
               monitor interval=5s (ClusterIP-monitor-interval-5s)
 Clone: Repo-clone
  Meta Attrs: interleave=false
  Resource: Repo (class=ocf provider=heartbeat type=apache)
   Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost:443/server-status
   Operations: start interval=0s timeout=40s (Repo-start-timeout-40s)
               stop interval=0s timeout=60s (Repo-stop-timeout-60s)
               monitor interval=30s (Repo-monitor-interval-30s)
 Clone: RepoFS-clone
  Meta Attrs: interleave=true
  Resource: RepoFS (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/drbd1 directory=/repo fstype=gfs2
   Operations: start interval=0s timeout=60 (RepoFS-start-timeout-60)
               stop interval=0s timeout=60 (RepoFS-stop-timeout-60)
               monitor interval=10s (RepoFS-monitor-interval-10s)
 Clone: dlm-clone
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-timeout-90)
               stop interval=0s timeout=100 (dlm-stop-timeout-100)
               monitor interval=30s (dlm-monitor-interval-30s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start ClusterIP-clone then start Repo-clone (kind:Mandatory) (id:order-ClusterIP-Repo-mandatory)
  promote RepoDataClone then start RepoFS-clone (kind:Mandatory) (id:order-RepoDataClone-RepoFS-mandatory)
  start RepoFS-clone then start Repo-clone (kind:Mandatory) (id:order-RepoFS-Repo-mandatory)
  start dlm-clone then start RepoFS-clone (kind:Mandatory) (id:order-dlm-RepoFS-mandatory)
  start RepoFS then start UpdateService (kind:Mandatory) (id:order-RepoFS-UpdateService-mandatory)
Colocation Constraints:
  Repo-clone with ClusterIP-clone (score:INFINITY) (id:colocation-Repo-ClusterIP-INFINITY)
  RepoFS-clone with dlm-clone (score:INFINITY) (id:colocation-RepoFS-dlm-INFINITY)
  RepoFS-clone with RepoDataClone (score:INFINITY) (id:colocation-RepoFS-RepoDataClone-INFINITY)

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.12-1.1.12+git20140723.483f48a
 default-resource-stickiness: 0
 expected-quorum-votes: 2
 last-lrm-refresh: 1432927197
 no-quorum-policy: freeze
 stonith-enabled: false

I've been searching all over and I can't seem to figure it out. Any idea what is causing my issue or how to fix it?

score 0 · Answer 1 · answered Jun 02 '15 at 22:43

By default the resource agent IPaddr2 only check if the ip address is configured, you can check this reading https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/IPaddr2

If you want to monitor the connectivity using icmp protocol, you can read this link http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_moving_resources_due_to_connectivity_changes.html

Pacemaker ClusterIP stops working after 15 minutes but is still running

1 Answers1