0

I have a Zabbix active/passive cluster using pacemaker and cman. However, I am seeing the following in "pcs status" and during failover the zabbix-server service is not coming up. The floating IP moves over just fine though.

OS CentOS 6.6 Zabbix 2.4

[root@abc-zabserver-b cluster]# rpm -qa | grep cman
cman-3.0.12.1-68.el6_6.1.x86_64
[root@abc-zabserver-b cluster]# rpm -qa | grep pacemaker
pacemaker-cluster-libs-1.1.12-4.el6.x86_64
pacemaker-1.1.12-4.el6.x86_64
pacemaker-libs-1.1.12-4.el6.x86_64
pacemaker-cli-1.1.12-4.el6.x86_64
[root@abc-zabserver-b ~]# rpm -qa | grep corosync
corosync-1.4.7-1.el6.x86_64
corosynclib-1.4.7-1.el6.x86_64

Here is the error

[root@abc-zabserver-b cluster]# pcs status
Cluster name: abc-zabvip
Last updated: Mon Jul 13 08:01:57 2015
Last change: Thu Jul  2 17:01:48 2015
Stack: cman
Current DC: abc-zabserver-a - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
2 Resources configured

Online: [ abc-zabserver-a abc-zabserver-b ]

Full list of resources:

 Resource Group: zabbix-cluster
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started abc-zabserver-a
     zabbix-server      (lsb:zabbix-server):    Stopped

Failed actions:
    zabbix-server_monitor_5000 on abc-zabserver-a 'not running' (7):     call=541, status=complete, last-rc-change='Mon Jul 13 08:01:57 2015', queued=0ms, exec=0ms

Here is the cluster.conf

<cluster config_version="9" name="abc-zabvip">
  <fence_daemon/>
  <clusternodes>
    <clusternode name="abc-zabserver-a" nodeid="1">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="abc-zabserver-a"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="abc-zabserver-b" nodeid="2">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="abc-zabserver-b"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
   <cman expected_votes="1" port="5405" transport="udpu" two_node="1"/>
  <fencedevices>
     <fencedevice agent="fence_pcmk" name="pcmk"/>
   </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

Here is /etc/sysconfig/cman

CMAN_QUORUM_TIMEOUT=0

Some other configs I did on this cluster

pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=10.99.122.69
cidr_netmask=24 op monitor interval=5s
pcs property set stonith-enabled=false
pcs resource create zabbix-server lsb:zabbix-server op monitor interval=5s
pcs resource group add zabbix-cluster ClusterIP zabbix-server
pcs property set no-quorum-policy=ignore
pcs property set default-resource-stickiness="100"

Error in zabbix_server.log

listener failed: zbx_tcp_listen() fatal error: unable to serve on any address [[-]:10051]

Zabbix server processes are running but service is not

[root@abc-zabserver-b zabbix]# service zabbix-server status
zabbix_server is stopped
[root@abc-zabserver-b zabbix]# ps afx | grep -i zabbix
26835 pts/0    S+     0:00  |       \_ grep -i zabbix
 2867 ?        S      0:00 zabbix_server: poller #50 [connecting to the     database]
 2926 ?        S      0:00 zabbix_server -c /etc/zabbi/zabbix_server.conf
 2962 ?        S      0:00 zabbix_server -c /etc/zabbi/zabbix_server.conf
[root@abc-zabserver-b zabbix]# service zabbix-server status
zabbix_server is stopped

pcs config show

[root@abc-zabserver-b zabbix]# pcs config show
Cluster Name: abc-zabvip
Corosync Nodes:
 abc-zabserver-a abc-zabserver-b
Pacemaker Nodes:
 abc-zabserver-a abc-zabserver-b

Resources:
 Group: zabbix-cluster
  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.99.122.69 cidr_netmask=24
   Operations: start interval=0s timeout=20s (ClusterIP-start- timeout-20s)
           stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
           monitor interval=5s (ClusterIP-monitor-interval-5s)
  Resource: zabbix-server (class=lsb type=zabbix-server)
  Operations: monitor interval=5s (zabbix-server-monitor-interval-5s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.11-97629de
 default-resource-stickiness: 100
 no-quorum-policy: ignore
 stonith-enabled: false
John Test
  • 89
  • 1
  • 3
  • 14

1 Answers1

0

There was a config mismatch between Zabbix binaries and config file that was causing this. darndest thing!

John Test
  • 89
  • 1
  • 3
  • 14
  • 2
    What exactly was the mismatch? – kasperd Jul 16 '15 at 05:42
  • @kasperd I am not exactly sure but the config files changed between Zabbix updates and the newer Zabbix didn't want to work with older config file. This was causing issues in start/stop and leaving procs behind. – John Test Jul 16 '15 at 21:16
  • @kasperd I guess both of these were related: http://serverfault.com/questions/706044/zabbix-server-not-starting-listener-failed-zbx-tcp-listen-fatal-error-unable – John Test Jul 16 '15 at 21:19