1

Error Message

Failed actions:
    httpd_monitor_5000 on abc-zabserver-b 'not running' (7): call=65,  status=complete, last-rc-change='Wed Jul 15 21:44:43 2015', queued=0ms, exec=8ms

pcs status

[root@abc-zabserver-b ~]# pcs status
Cluster name: abc-zabvip
Last updated: Wed Jul 15 21:50:52 2015
Last change: Wed Jul 15 20:38:07 2015
Stack: cman
Current DC: abc-zabserver-b - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
3 Resources configured


Online: [ abc-zabserver-a abc-zabserver-b ]

Full list of resources:

Resource Group: zabbix-cluster
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started abc-zabserver-b
     zabbix-server      (lsb:zabbix-server):    Started abc-zabserver-b
     httpd      (lsb:httpd):    Started abc-zabserver-b

Failed actions:
    httpd_monitor_5000 on abc-zabserver-b 'not running' (7): call=65,  status=complete, last-rc-change='Wed Jul 15 21:44:43 2015', queued=0ms, exec=8ms

Resource configuration

pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=10.99.122.69    cidr_netmask=24 op monitor interval=5s
pcs property set stonith-enabled=false
pcs resource create zabbix-server lsb:zabbix-server op monitor interval=5s
pcs resource create httpd lsb:httpd op monitor interval=5s
pcs resource group add zabbix-cluster ClusterIP zabbix-server httpd
pcs property set no-quorum-policy=ignore
pcs property set default-resource-stickiness="100"

pcs config show

[root@abc-zabserver-b ~]# pcs config show
Cluster Name: abc-zabvip
Corosync Nodes:
 abc-zabserver-a abc-zabserver-b
Pacemaker Nodes:
 abc-zabserver-a abc-zabserver-b

Resources:
 Group: zabbix-cluster
  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.99.122.69 cidr_netmask=24
   Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
           stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
           monitor interval=5s (ClusterIP-monitor-interval-5s)
  Resource: zabbix-server (class=lsb type=zabbix-server)
   Operations: monitor interval=5s (zabbix-server-monitor-interval-5s)
  Resource: httpd (class=lsb type=httpd)
   Operations: monitor interval=5s (httpd-monitor-interval-5s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.11-97629de
 default-resource-stickiness: 100
 no-quorum-policy: ignore
 stonith-enabled: false

cluster.conf

[root@abc-zabserver-b ~]# cat /etc/cluster/cluster.conf
<cluster config_version="9" name="abc-zabvip">
  <fence_daemon/>
  <clusternodes>
    <clusternode name="abc-zabserver-a" nodeid="1">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="abc-zabserver-a"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="abc-zabserver-b" nodeid="2">
      <fence>
        <method name="pcmk-redirect">
          <device name="pcmk" port="abc-zabserver-b"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <cman expected_votes="1" port="5405" transport="udpu" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_pcmk" name="pcmk"/>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>
John Test
  • 89
  • 1
  • 3
  • 14
  • What say apache logs? Why not use the RA (`ocf:heartbeat:apache`) for apache? – Federico Sierra Jul 16 '15 at 03:31
  • @FedericoSierra I tried to follow this (http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_update_the_configuration.html) but I don't have "crm" command. I have a whole bunch of commands that start with crm such as crmadmin crm_diff etc. It would've been better if I could just do this with pcs so all config was in the same place. – John Test Jul 16 '15 at 05:26
  • Apache error log -> http://pastebin.com/cJxwGjQ0 /var/log/messages -> http://pastebin.com/NTQLmamm – John Test Jul 16 '15 at 05:29
  • See http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_configure_the_cluster.html – Federico Sierra Jul 16 '15 at 12:26
  • @JohnTest I don't think you mentioned distro, but assuming it's some RedHat flavor, you can pick packages for crmsh here: http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ You'll need crmsh, pssh, and python-pssh, and python-dateutil (which you can get from your distros repos). – Matt Kereczman Jul 16 '15 at 15:18

2 Answers2

4

The httpd resource does appear to be running (based on the pcs status output you've shown). Perhaps something stopped the service while Pacemaker was monitoring it, which would throw the error you see above, and trigger a recovery.

If you grep your logs (on the DC: "Current DC: vda-zabserver-b - partition with quorum") for "LogActions", you should see any Start/Stop/Recover/Restart/Leave actions Pacemaker performed on the resources.

If that was the case, you will want to make sure nothing except Pacemaker is managing these clustered services; Pacemaker expects to be the only thing starting and stopping these services.

You can cleanup the error by running the following command:

# pcs resource cleanup httpd

The return code 7, usually means that the service wasn't running when Pacemaker checked its status.

http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html http://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

Matt Kereczman
  • 1,899
  • 9
  • 12
1

I fixed it by uncommenting status URL in httpd.conf and creating resource this way. Make sure though http://localhost/server-status is accessible before adding resource

pcs resource create httpd apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://localhost/server-status" op monitor interval=5s --group zabbix-cluster
John Test
  • 89
  • 1
  • 3
  • 14