I'm using Pacemaker 1.1.13 and Corosync 2.3.4 on Centos7.
I've a problem with Master/Slave resource. There is meta attrs for my resource:
migration-threshold=1
failure-timeout=10s
but when the resource goes down, there is only one attempt to start it. Documentation says that attribute failure-timeout=10s should reset failcount every 10 seconds, but that does not happen, so resource never start.
Do You know anything about this problem? Maybe I'm doing something wrong? I'm sending my 'pcs status' below:
Cluster Name: webcluster
Corosync Nodes:
10.121.100.101 10.121.100.102
Pacemaker Nodes:
pm-node1 pm-node2
Resources:
Master: Services-master
Meta Attrs: failure-timeout=10s
Group: Services
Meta Attrs: migration-threshold=1
Resource: Test (class=ocf provider=scooty type=test)
Operations: start interval=0s timeout=20 (Test-start-interval-0s)
stop interval=0s timeout=20 (Test-stop-interval-0s)
monitor interval=10 role=Master timeout=20 (Test-monitor-interval-10)
monitor interval=11 role=Slave timeout=20 (Test-monitor-interval-11)
Stonith Devices:
Fencing Levels:
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Resources Defaults:
migration-threshold: 1
failure-timeout: 10
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: webcluster
dc-version: 1.1.13-10.el7_2.4-44eb2dd
have-watchdog: false
last-lrm-refresh: 1475145002
no-quorum-policy: ignore
start-failure-is-fatal: false
stonith-enabled: false