nginx high-availability with corosync and pacemaker

Question

I configured nginx in a failover with corosync and pacemaker.

Everything looked until the test of the failover:

The VirtualIP is transmitted to the other host, but the resouce reverseproxy fails to start. If I do a debug-start, everything works fine.

Please see my configuration:

# crm_verify -VL
error: unpack_rsc_op: Preventing reverseproxy from re-starting anywhere: operation start failed 'not configured' (6)

# pcs status
Cluster name: cluster_reverse
Last updated: Fri Oct 9 13:43:15 2015 Last change: Fri Oct 9 10:37:56 2015 by root via crm_resource on reverse1.domain.de
Stack: corosync
Current DC: reverse2.domain.de (version 1.1.13-a14efad) - partition with quorum
2 nodes and 2 resources configured
Online: [ reverse1.domain.de reverse2.domain.de ]
Full list of resources:
virtual_ip (ocf::heartbeat:IPaddr2): Started reverse1.domain.de
reverseproxy (ocf::heartbeat:nginx): Stopped
Failed Actions:
* reverseproxy_start_0 on reverse1.domain.de 'not configured' (6): call=13,   status=complete, exitreason='none',
last-rc-change='Fri Oct 9 12:31:41 2015', queued=0ms, exec=102ms

PCSD Status:
reverse1.domain.de: Online
reverse2.domain.de: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

# pcs resource show reverseproxy
Resource: reverseproxy (class=ocf provider=heartbeat type=nginx)
Attributes: configfile=/etc/nginx/nginx.conf status10url=http://127.0.0.1
Operations: start interval=0s timeout=40s (reverseproxy-start-timeout-40s)
stop interval=0s timeout=60s (reverseproxy-stop-timeout-60s)
monitor interval=1min (reverseproxy-monitor-interval-1min)

I can see the errors, but I have no idea how to solve it.

Any ideas would be appreciated,

Thanks in advance

EDIT: Here the complete config:

Hello, no problem:

# pcs config

Cluster Name: cluster_reverse
Corosync Nodes:
reverse1.domain.de reverse2.domain.de
Pacemaker Nodes:
reverse1.domain.de reverse2.domain.de
Resources:
Resource: virtual_ip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=194.15.215.49 cidr_netmask=32
Operations: start interval=0s timeout=20s (virtual_ip-start-timeout-20s)
stop interval=0s timeout=20s (virtual_ip-stop-timeout-20s)
monitor interval=30s (virtual_ip-monitor-interval-30s)
Resource: reverseproxy (class=ocf provider=heartbeat type=nginx)
Attributes: configfile=/etc/nginx/nginx.conf status10url=http://127.0.0.1
Operations: start interval=0s timeout=40s (reverseproxy-start-timeout-40s)
stop interval=0s timeout=60s (reverseproxy-stop-timeout-60s)
monitor interval=1min (reverseproxy-monitor-interval-1min)
Stonith Devices:
Fencing Levels:
Location Constraints:
Resource: reverseproxy
Enabled on: reverse1.domain.de (score:50) (id:location-reverseproxy-reverse1.domain.de-50)
Ordering Constraints:
start virtual_ip then start reverseproxy (kind:Mandatory) (id:order-virtual_ip-reverseproxy-mandatory)
Colocation Constraints:
reverseproxy with virtual_ip (score:INFINITY) (id:colocation-reverseproxy-virtual_ip-INFINITY)
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: cluster_reverse
dc-version: 1.1.13-a14efad
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false`

show pcs config and tell us if you have a constraint or a group, to make ip and ngix to reside in the same node. — c4f4t0r, Oct 09 '15 at 15:35
ngix is configured to listen on failover ip? If the answer is yes, you need to use group or colocation constraints, anyway show us the output of pcs config — c4f4t0r, Oct 12 '15 at 07:10
If your problem happen when you shuting down one node, you need to thinkg about to configure you fencing(stonith), If the migrate works when You use pcs resource move and not works when you test the cluster, that's clear that you need to configure the fencing. — c4f4t0r, Oct 12 '15 at 07:16
It tested the following regarding to your hints. Starting Point is, that reverse1 is the active node with virtualIP started and reverseproxy started via debug-start. When I pcs cluster stop reverse1, reverse2 gets the active node without a problem. After pcs cluster start on reverse1, reverse2 remains active. pcs cluster stop on reverse2 makes reverse1 get the virtualIP but doesnt start reverseproxy. Again I need to debug-start it ... — pwe, Oct 12 '15 at 08:03
you need to configure stonith, the cluster without fencing cannot work. — c4f4t0r, Oct 12 '15 at 08:21
Do you really need to start the nginx? Having it running all the time probably won't consume too much memory and you don't run the risk of it not starting, when needed. — Fox, Oct 12 '15 at 09:27
@c4f4t0r: It feels like pain in the ass to allow the machines to restart themselves in VMWare-Cluster ... so it wouldnt be a proper solution — pwe, Oct 12 '15 at 09:34
@Fox I want maximum failover functionality. And if I start nginx on both nodes, the second one will complain because he wants to listen at the virtualIP, which he doesnt have. — pwe, Oct 12 '15 at 09:37
I dont get it, why the failover reverse1 --> reverse2 **does** work and the way back doesnt, because reverse1 "cant start nginx" ... — pwe, Oct 12 '15 at 09:39
@pwe good lock using cluster without fencing, that is real pain. — c4f4t0r, Oct 13 '15 at 09:52
The only shared ressource I got is the IP-Adress. There is no filesystem or database shared in background. Does this matter? — pwe, Oct 15 '15 at 11:47
@pwe: If you want to start `Nginx` on both nodes (which might be a good idea anyways, because it reduces fail-over time), have a look at the following link, to allow binding to non-local IPs: http://linux-ip.net/html/adv-nonlocal-bind.html — gxx, Nov 10 '15 at 20:25

nginx high-availability with corosync and pacemaker

0 Answers0