I have 2 server Asterisk on centOS in HA group with peacemaker.The cluster is in active passive mode with configured resource groups.The problem is from some time the server switch from the primary to secondary with no obvious reason for me.The log begins with:
PBX1 attrd[2503]: notice: attrd_perform_update: Sent update 327: pingd=0
followed with stopping of the resource groups and
Oct 3 19:30:39 PBX1 Filesystem(DrbdFS)[14168]: INFO: Trying to unmount /drbd
Oct 3 19:30:39 PBX1 Filesystem(DrbdFS)[14168]: INFO: unmounted /drbd successfully
Oct 3 19:30:39 PBX1 crmd[2505]: notice: process_lrm_event: LRM operation DrbdFS_stop_0 (call=2046, rc=0, cib-update=2735, confirmed=true) ok
Oct 3 19:30:39 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 32: demote drbd_drbd0_demote_0 on pbx1(local)
Oct 3 19:30:39 PBX1 kernel: block drbd0: role( Primary -> Secondary )
Oct 3 19:30:39 PBX1 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
Oct 3 19:30:39 PBX1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Oct 3 19:30:39 PBX1 crmd[2505]: notice: process_lrm_event: LRM operation drbd_drbd0_demote_0 (call=2050, rc=0, cib-update=2736, confirmed=true) ok
Oct 3 19:30:39 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 159: notify drbd_drbd0_post_notify_demote_0 on pbx2
Oct 3 19:30:39 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 161: notify drbd_drbd0_post_notify_demote_0 on pbx1 (local)
Oct 3 19:30:39 PBX1 crmd[2505]: notice: process_lrm_event: LRM operation drbd_drbd0_notify_0 (call=2053, rc=0, cib-update=0, confirmed=true) ok
Oct 3 19:30:39 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 156: notify drbd_drbd0_pre_notify_stop_0 on pbx2
Oct 3 19:30:39 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 157: notify drbd_drbd0_pre_notify_stop_0 on pbx1 (local)
Oct 3 19:30:39 PBX1 crmd[2505]: notice: process_lrm_event: LRM operation drbd_drbd0_notify_0 (call=2056, rc=0, cib-update=0, confirmed=true) ok
Oct 3 19:30:39 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 31: stop drbd_drbd0_stop_0 on pbx2
Oct 3 19:30:39 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 33: stop drbd_drbd0_stop_0 on pbx1 (local)
Oct 3 19:30:39 PBX1 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Oct 3 19:30:39 PBX1 kernel: block drbd0: asender terminated
Oct 3 19:30:39 PBX1 kernel: block drbd0: Terminating drbd0_asender
Oct 3 19:30:39 PBX1 kernel: block drbd0: Connection closed
Oct 3 19:30:39 PBX1 kernel: block drbd0: conn( Disconnecting -> StandAlone )
Oct 3 19:30:39 PBX1 kernel: block drbd0: receiver terminated
Oct 3 19:30:39 PBX1 kernel: block drbd0: Terminating drbd0_receiver
Oct 3 19:30:39 PBX1 kernel: block drbd0: disk( UpToDate -> Failed )
Oct 3 19:30:39 PBX1 kernel: block drbd0: bitmap WRITE of 0 pages took 0 jiffies
Oct 3 19:30:39 PBX1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Oct 3 19:30:39 PBX1 kernel: block drbd0: disk( Failed -> Diskless )
Oct 3 19:30:39 PBX1 kernel: block drbd0: drbd_bm_resize called with capacity == 0
Oct 3 19:30:39 PBX1 kernel: block drbd0: worker terminated
Oct 3 19:30:39 PBX1 kernel: block drbd0: Terminating drbd0_worker
Oct 3 19:30:39 PBX1 attrd[2503]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_drbd0 (<null>)
Oct 3 19:30:39 PBX1 attrd[2503]: notice: attrd_perform_update: Sent delete 329: node=pbx1, attr=master-drbd_drbd0, id=<n/a>, set=(null), section=status
Oct 3 19:30:39 PBX1 crmd[2505]: notice: process_lrm_event: LRM operation drbd_drbd0_stop_0 (call=2059, rc=0, cib-update=2737, confirmed=true) ok
Oct 3 19:30:39 PBX1 crmd[2505]: notice: run_graph: Transition 2112 (Complete=63, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-3294.bz2): Stopped
Oct 3 19:30:39 PBX1 pengine[2504]: notice: unpack_config: On loss of CCM Quorum: Ignore
Oct 3 19:30:39 PBX1 pengine[2504]: notice: process_pe_message: Calculated Transition 2113: /var/lib/pacemaker/pengine/pe-input-3295.bz2
Oct 3 19:30:39 PBX1 crmd[2505]: notice: run_graph: Transition 2113 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-3295.bz2): Complete
Oct 3 19:30:39 PBX1 crmd[2505]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 3 19:30:41 PBX1 attrd[2503]: notice: attrd_perform_update: Sent update 333: pingd=1000
Oct 3 19:30:41 PBX1 crmd[2505]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 3 19:30:41 PBX1 pengine[2504]: notice: unpack_config: On loss of CCM Quorum: Ignore
Oct 3 19:30:41 PBX1 pengine[2504]: notice: LogActions: Start drbd_drbd0:0#011(pbx2)
Oct 3 19:30:41 PBX1 pengine[2504]: notice: process_pe_message: Calculated Transition 2114: /var/lib/pacemaker/pengine/pe-input-3296.bz2
Oct 3 19:30:41 PBX1 pengine[2504]: notice: unpack_config: On loss of CCM Quorum: Ignore
Oct 3 19:30:41 PBX1 pengine[2504]: notice: LogActions: Start drbd_drbd0:0#011(pbx1)
Oct 3 19:30:41 PBX1 pengine[2504]: notice: LogActions: Start drbd_drbd0:1#011(pbx2)
Oct 3 19:30:41 PBX1 pengine[2504]: notice: process_pe_message: Calculated Transition 2115: /var/lib/pacemaker/pengine/pe-input-3297.bz2
Oct 3 19:30:41 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 7: start drbd_drbd0_start_0 on pbx1 (local)
Oct 3 19:30:41 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 9: start drbd_drbd0:1_start_0 on pbx2
Oct 3 19:30:41 PBX1 kernel: block drbd0: Starting worker thread (from cqueue [2046])
Oct 3 19:30:41 PBX1 kernel: block drbd0: disk( Diskless -> Attaching )
Oct 3 19:30:41 PBX1 kernel: block drbd0: Found 4 transactions (192 active extents) in activity log.
Oct 3 19:30:41 PBX1 kernel: block drbd0: Method to ensure write ordering: flush
Oct 3 19:30:41 PBX1 kernel: block drbd0: drbd_bm_resize called with capacity == 28411864
Oct 3 19:30:41 PBX1 kernel: block drbd0: resync bitmap: bits=3551483 words=55492 pages=109
Oct 3 19:30:41 PBX1 kernel: block drbd0: size = 14 GB (14205932 KB)
Oct 3 19:30:41 PBX1 kernel: block drbd0: bitmap READ of 109 pages took 13 jiffies
Oct 3 19:30:41 PBX1 kernel: block drbd0: recounting of set bits took additional 1 jiffies
Oct 3 19:30:41 PBX1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Oct 3 19:30:41 PBX1 kernel: block drbd0: disk( Attaching -> UpToDate )
Oct 3 19:30:41 PBX1 kernel: block drbd0: attached to UUIDs 760BE94B0D6CDF8A:0000000000000000:819257FD75678FC9:819157FD75678FC9
Oct 3 19:30:41 PBX1 kernel: block drbd0: conn( StandAlone -> Unconnected )
Oct 3 19:30:41 PBX1 kernel: block drbd0: Starting receiver thread (from drbd0_worker [14408])
Oct 3 19:30:41 PBX1 kernel: block drbd0: receiver (re)started
Oct 3 19:30:41 PBX1 kernel: block drbd0: conn( Unconnected -> WFConnection )
Oct 3 19:30:41 PBX1 attrd[2503]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-drbd_drbd0 (1000)
Oct 3 19:30:41 PBX1 attrd[2503]: notice: attrd_perform_update: Sent update 336: master-drbd_drbd0=1000
Oct 3 19:30:41 PBX1 attrd[2503]: notice: attrd_perform_update: Sent update 338: master-drbd_drbd0=1000
Oct 3 19:30:41 PBX1 crmd[2505]: notice: process_lrm_event: LRM operation drbd_drbd0_start_0 (call=2062, rc=0, cib-update=2741, confirmed=true) ok
Oct 3 19:30:41 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 106: notify drbd_drbd0_post_notify_start_0 on pbx1 (local)
Oct 3 19:30:41 PBX1 crmd[2505]: notice: te_rsc_command: Initiating action 107: notify drbd_drbd0:1_post_notify_start_0 on pbx2
Oct 3 19:30:41 PBX1 crmd[2505]: notice: process_lrm_event: LRM operation drbd_drbd0_notify_0 (call=2065, rc=0, cib-update=0, confirmed=true) ok
Oct 3 19:30:41 PBX1 crmd[2505]: notice: run_graph: Transition 2115 (Complete=10, Pending=0, Fired=0, Skipped=2, Incompl
the part for ping from pcs config is
Attributes: drbd_resource=drbd0
Operations: monitor interval=10s (drbd_drbd0-monitor-interval-10s)
Clone: Connectivity
Resource: p_ping (class=ocf provider=pacemaker type=ping)
Attributes: host_list="10.66.9.4 10.66.9.5 10.66.9.11 10.66.9.252 10.66.9.253 10.66.9.254" multiplier=1000 dampen=5s
Operations: monitor interval=1s (p_ping-monitor-interval-1s)
Important info is that only 10.66.9.11 is alive (i'am planning to clear the invalid addr. but this is other story) from those addresses because of the redesign witch took place before an year ago but the problem is seen from month or two max.Could it be from that or please say if you have any other idea. P.S Cables are good and their is no loss of icmp to 10.66.9.11 interface according to cacti witch configured for monitoring
Oct 3 19:18:31 PBX1 asterisk(asterisk)[1338]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:19:01 PBX1 asterisk(asterisk)[1960]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:19:14 PBX1 Xinetd(Xinetd)[2200]: INFO: checking "disable" in /etc/xinetd.d/tftp
Oct 3 19:19:32 PBX1 asterisk(asterisk)[2347]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:20:02 PBX1 asterisk(asterisk)[2994]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:20:28 PBX1 dhcpd: isc-dhcpd-4.1.1-P1
Oct 3 19:20:32 PBX1 asterisk(asterisk)[3369]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:21:02 PBX1 asterisk(asterisk)[3970]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:21:08 PBX1 crmd[2505]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
Oct 3 19:21:08 PBX1 pengine[2504]: notice: unpack_config: On loss of CCM Quorum: Ignore
Oct 3 19:21:08 PBX1 pengine[2504]: notice: process_pe_message: Calculated Transition 2110: /var/lib/pacemaker/pengine/pe-input-3292.bz2
Oct 3 19:21:08 PBX1 crmd[2505]: notice: run_graph: Transition 2110 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-3292.bz2): Complete
Oct 3 19:21:08 PBX1 crmd[2505]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 3 19:21:14 PBX1 Xinetd(Xinetd)[4176]: INFO: checking "disable" in /etc/xinetd.d/tftp
Oct 3 19:21:32 PBX1 asterisk(asterisk)[4318]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:22:02 PBX1 asterisk(asterisk)[4989]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:22:28 PBX1 dhcpd: isc-dhcpd-4.1.1-P1
Oct 3 19:22:32 PBX1 asterisk(asterisk)[5352]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:23:02 PBX1 asterisk(asterisk)[5939]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:23:14 PBX1 Xinetd(Xinetd)[6160]: INFO: checking "disable" in /etc/xinetd.d/tftp
Oct 3 19:23:32 PBX1 asterisk(asterisk)[6302]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:24:02 PBX1 asterisk(asterisk)[6933]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:24:28 PBX1 dhcpd: isc-dhcpd-4.1.1-P1
Oct 3 19:24:32 PBX1 asterisk(asterisk)[7276]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:25:02 PBX1 asterisk(asterisk)[7895]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:25:14 PBX1 Xinetd(Xinetd)[8118]: INFO: checking "disable" in /etc/xinetd.d/tftp
Oct 3 19:25:32 PBX1 asterisk(asterisk)[8259]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:25:37 PBX1 named[27202]: error (network unreachable) resolving 'ns.isc.afilias-nst.info/A/IN': 2a01:8840:6::1#53
Oct 3 19:25:37 PBX1 named[27202]: error (network unreachable) resolving 'ns.isc.afilias-nst.info/AAAA/IN': 2a01:8840:6::1#53
Oct 3 19:25:37 PBX1 named[27202]: error (network unreachable) resolving 'ns1.isc.ultradns.net/A/IN': 2001:dcd:4::2#53
Oct 3 19:25:37 PBX1 named[27202]: error (network unreachable) resolving 'ns2.isc.ultradns.net/A/IN': 2001:dcd:4::2#53
Oct 3 19:25:37 PBX1 named[27202]: error (network unreachable) resolving 'ns1.isc.ultradns.net/AAAA/IN': 2001:dcd:4::2#53
Oct 3 19:25:37 PBX1 named[27202]: error (network unreachable) resolving 'ns2.isc.ultradns.net/AAAA/IN': 2001:dcd:4::2#53
Oct 3 19:26:02 PBX1 asterisk(asterisk)[8873]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:26:28 PBX1 dhcpd: isc-dhcpd-4.1.1-P1
Oct 3 19:26:32 PBX1 asterisk(asterisk)[9233]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:27:02 PBX1 asterisk(asterisk)[9841]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:27:14 PBX1 Xinetd(Xinetd)[10088]: INFO: checking "disable" in /etc/xinetd.d/tftp
Oct 3 19:27:32 PBX1 asterisk(asterisk)[10230]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:28:02 PBX1 asterisk(asterisk)[10862]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:28:28 PBX1 dhcpd: isc-dhcpd-4.1.1-P1
Oct 3 19:28:32 PBX1 asterisk(asterisk)[11224]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:29:03 PBX1 asterisk(asterisk)[11810]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:29:14 PBX1 Xinetd(Xinetd)[12031]: INFO: checking "disable" in /etc/xinetd.d/tftp
Oct 3 19:29:33 PBX1 asterisk(asterisk)[12174]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:30:03 PBX1 asterisk(asterisk)[12818]: INFO: 0 active channels 0 active calls 5999 calls processed
Oct 3 19:30:23 PBX1 attrd[2503]: notice: attrd_perform_update: Sent update 327: pingd=0
Oct 3 19:30:23 PBX1 crmd[2505]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 3 19:30:23 PBX1 pengine[2504]: notice: unpack_config: On loss of CCM Quorum: Ignore
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop drbd_drbd0:0#011(pbx2..bg)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: process_pe_message: Calculated Transition 2111: /var/lib/pacemaker/pengine/pe-input-3293.bz2
Oct 3 19:30:23 PBX1 pengine[2504]: notice: unpack_config: On loss of CCM Quorum: Ignore
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop drbd_drbd0:0#011(pbx2..bg)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Demote drbd_drbd0:1#011(Master -> Stopped pbx1)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop ClusPBXIP#011(pbx1)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop ClusNetIP#011(pbx1)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop ClusPBXIP-R#011(pbx1)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop ClusPBXIP-R2#011(pbx1)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop DSKPBXIP-R#011(pbx1)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop Fixr_ClusPBXIP#011(pbx1)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop DSKPBXIP-R5#011(pbx1)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop DSKPBXIP-R6#011(pbx1)
Oct 3 19:30:23 PBX1 pengine[2504]: notice: LogActions: Stop DSKPBXIP-R7#011(pbx1)