1

I've been trying to get pgpool2 + watchdog running in a local docker stack. It's actually pretty easy without watchdog, and I was able to follow the docs and have pgpool doing its thing for load balancing in master-slave mode.

The trouble comes with Watchdog. Whichever pgpool container starts first ends up electing itself the master node because it initializes first. Ok, no problem. However, any other nodes that initialize and attempt to connect to the master are outright rejected, causing them exit.

(Apologies ahead of time for allllll the logs and configuration files here; I limited the confs to the watchdog-relevant sections, but I can add more if requested)

pgpool-1 conf

use_watchdog = on
trusted_servers = 'app'
ping_path = '/bin'
wd_ipc_socket_dir = '/var/run/pgpool'
wd_monitoring_interfaces_list = ''
wd_hostname = '172.31.0.7'
wd_port = 9000
wd_authkey = ''

delegate_IP = ''
ifconfig_path = '/sbin'
if_up_cmd = 'ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.0'
if_down_cmd = 'ifconfig eth0:0 down'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U $_IP_$ -w 1'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_monitoring_interfaces_list = 'eth0'
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'pgpool2'
heartbeat_destination_port0 = 9694
heartbeat_device0 = 'eth0'

wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''

other_pgpool_hostname0 = 'pgpool2'
other_pgpool_port0 = 5432
other_wd_port0 = 9000

pgpool-2 conf

use_watchdog = on
trusted_servers = 'srp'
ping_path = '/bin'
wd_ipc_socket_dir = '/var/run/pgpool'
wd_monitoring_interfaces_list = ''
wd_hostname = '172.31.0.9'
wd_port = 9000
wd_authkey = ''
delegate_IP = ''
ifconfig_path = '/sbin'
if_up_cmd = 'ifconfig eth0:0 inet $_IP_$ netmask 255.255.255.0'
if_down_cmd = 'ifconfig eth0:0 down'
arping_path = '/usr/sbin'           # arping command path
arping_cmd = 'arping -U $_IP_$ -w 1'
clear_memqcache_on_escalation = on
wd_escalation_command = ''
wd_monitoring_interfaces_list = 'eth0'
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'pgpool1'
heartbeat_destination_port0 = 9694 
heartbeat_device0 = 'eth0'

wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''

other_pgpool_hostname0 = 'pgpool1'
other_pgpool_port0 = 5432
other_wd_port0 = 9000

docker-compose.yml

services:
...

  database:
    container_name: "database"
    image: postgres:9.5-custom
    environment:
      - POSTGRES_PASSWORD=password
      - PGDATA=/var/lib/postgresql/data/pgdata
    volumes:
      - '/var/lib/postgresql/data'
    expose:
      - '5432'
    ports:
      - '5431:5432'
  database-slave:
    container_name: "database-slave"
    image: postgres:9.5-custom
    environment:
      - POSTGRES_PASSWORD=password
      - PGDATA=/var/lib/postgresql/data/pgdata
      - REPLICATE_FROM=database
    volumes:
      - '/var/lib/postgresql/data'
    expose:
     - '5432'
    ports:
      - '5430:5432'
    depends_on:
      - database
  pgpool1:
    container_name: "pgpool1"
    image: pgpool:latest
    depends_on:
      - database
    environment:
      - PGPOOL_BACKENDS=0:database:5432,1:database-slave:5432
      - PGPOOL_SR_CHECK_PASSWORD=password
      - PGPOOL_POOL_PASSWD_USER=someuser
      - PGPOOL_POOL_PASSWD_PASSWORD=password
      - PGPOOL_WD_TRUSTED_SERVERS=app
      - PGPOOL_OTHER_PGPOOL_HOSTNAMES=0:pgpool2:9000
      - PGPOOL_WD_HEARTBEAT_DESTINATIONS=0:pgpool2:eth0
      - SLEEP=10
    expose:
      - '9000'
      - '5432'
      - '9694'

  pgpool2:
    container_name: "pgpool2"
    image: pgpool:latest
    depends_on:
      - database
    environment:
      - PGPOOL_BACKENDS=0:database:5432,1:database-slave:5432
      - PGPOOL_SR_CHECK_PASSWORD=password
      - PGPOOL_POOL_PASSWD_USER=someuser
      - PGPOOL_POOL_PASSWD_PASSWORD=password
      - PGPOOL_WD_TRUSTED_SERVERS=app
      - PGPOOL_OTHER_PGPOOL_HOSTNAMES=0:pgpool1:9000
      - PGPOOL_WD_HEARTBEAT_DESTINATIONS=0:pgpool1:eth0
      - SLEEP=15
    expose:
      - '9000'
      - '5432'
      - '9694'

pgpool-1 (master) output:

2018-10-01 18:00:56: pid 21: DEBUG:  initializing pool configuration
2018-10-01 18:00:56: pid 21: DETAIL:  num_backends: 2 total_weight: 2.000000
2018-10-01 18:00:56: pid 21: DEBUG:  initializing pool configuration
2018-10-01 18:00:56: pid 21: DETAIL:  backend 0 weight: 1073741823.500000 flag: 0000
2018-10-01 18:00:56: pid 21: DEBUG:  initializing pool configuration
2018-10-01 18:00:56: pid 21: DETAIL:  backend 1 weight: 1073741823.500000 flag: 0000
2018-10-01 18:00:56: pid 21: DEBUG:  loading hba configuration
2018-10-01 18:00:56: pid 21: DETAIL:  loading file :"/etc/pgpool2/pool_hba.conf" for client authentication configuration file
2018-10-01 18:00:56: pid 21: LOG:  Backend status file /var/log/postgresql/pgpool_status does not exist
2018-10-01 18:00:56: pid 21: DEBUG:  pool_coninfo_size: num_init_children (32) * max_pool (4) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 2228224 bytes requested for shared memory
2018-10-01 18:00:56: pid 21: DEBUG:  ProcessInfo: num_init_children (32) * sizeof(ProcessInfo) (32) = 1024 bytes requested for shared memory
2018-10-01 18:00:56: pid 21: DEBUG:  Request info are: sizeof(POOL_REQUEST_INFO) 5304 bytes requested for shared memory
2018-10-01 18:00:56: pid 21: DEBUG:  Recovery management area: sizeof(int) 4 bytes requested for shared memory
2018-10-01 18:00:56: pid 21: LOG:  waiting for watchdog to initialize
2018-10-01 18:00:56: pid 22: LOG:  setting the local watchdog node name to "172.31.0.8:5432 Linux 12affe97a14d"
2018-10-01 18:00:56: pid 22: LOG:  watchdog cluster is configured with 1 remote nodes
2018-10-01 18:00:56: pid 22: LOG:  watchdog remote node:0 on pgpool2:9000
2018-10-01 18:00:56: pid 22: DEBUG:  adding monitoring interface [0] name eth0 index 1950
2018-10-01 18:00:56: pid 22: INFO:  IPC socket path: "/var/run/pgpool/.s.PGPOOLWD_CMD.9000"
2018-10-01 18:00:56: pid 22: DEBUG:  network interface lo having flags 65609
2018-10-01 18:00:56: pid 22: DEBUG:  network interface tunl0 having flags 128
2018-10-01 18:00:56: pid 22: DEBUG:  network interface ip6tnl0 having flags 128
2018-10-01 18:00:56: pid 22: DEBUG:  network interface eth0 having flags 69699
2018-10-01 18:00:56: pid 22: DEBUG:  network interface "eth0" link is active
2018-10-01 18:00:56: pid 22: DEBUG:  network interface "eth0" link is up
2018-10-01 18:00:56: pid 22: DEBUG:  network interface lo having flags 65609
2018-10-01 18:00:56: pid 22: DEBUG:  network interface eth0 having flags 69699
2018-10-01 18:00:56: pid 22: DEBUG:  network interface "eth0" link is active
2018-10-01 18:00:56: pid 22: DEBUG:  network interface "eth0" link is up
2018-10-01 18:00:56: pid 22: DEBUG:  network interface "eth0" is up and we can continue
2018-10-01 18:00:56: pid 22: LOG:  watchdog node state changed from [DEAD] to [LOADING]
2018-10-01 18:00:56: pid 22: DEBUG:  STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = LOADING
2018-10-01 18:00:56: pid 22: DEBUG:  error in outbound connection to pgpool2:9000
2018-10-01 18:00:56: pid 22: DETAIL:  Connection refused
2018-10-01 18:01:01: pid 22: DEBUG:  STATE MACHINE INVOKED WITH EVENT = TIMEOUT Current State = LOADING
2018-10-01 18:01:01: pid 22: LOG:  watchdog node state changed from [LOADING] to [JOINING]
2018-10-01 18:01:01: pid 22: DEBUG:  STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = JOINING
2018-10-01 18:01:01: pid 22: DEBUG:  error in outbound connection to pgpool2:9000
2018-10-01 18:01:01: pid 22: DETAIL:  Connection refused
2018-10-01 18:01:01: pid 22: LOG:  new watchdog node connection is received from "172.31.0.9:27844"
2018-10-01 18:01:01: pid 22: DEBUG:  received watchdog packet type:A
2018-10-01 18:01:01: pid 22: DEBUG:  reading packet type A of length 242
2018-10-01 18:01:01: pid 22: DEBUG:  ADD NODE MESSAGE from hostname:"172.31.0.9" port:9000 pgpool_port:5432
2018-10-01 18:01:01: pid 22: NOTICE:  add node from hostname:"172.31.0.9" port:9000 pgpool_port:5432 rejected.
2018-10-01 18:01:01: pid 22: DETAIL:  verify the other watchdog node configurations
2018-10-01 18:01:01: pid 22: DEBUG:  sending packet, watchdog node:[�
                                                                     WaVU] command id:[4] type:[REJECT] state:[JOINING]
2018-10-01 18:01:01: pid 22: DEBUG:  sending watchdog packet to socket:7, type:[R], command ID:4, data Length:0

pgpool-2 (crashes) output:

2018-10-01 18:01:01: pid 21: DEBUG:  initializing pool configuration
2018-10-01 18:01:01: pid 21: DETAIL:  num_backends: 2 total_weight: 2.000000
2018-10-01 18:01:01: pid 21: DEBUG:  initializing pool configuration
2018-10-01 18:01:01: pid 21: DETAIL:  backend 0 weight: 1073741823.500000 flag: 0000
2018-10-01 18:01:01: pid 21: DEBUG:  initializing pool configuration
2018-10-01 18:01:01: pid 21: DETAIL:  backend 1 weight: 1073741823.500000 flag: 0000
2018-10-01 18:01:01: pid 21: DEBUG:  loading hba configuration
2018-10-01 18:01:01: pid 21: DETAIL:  loading file :"/etc/pgpool2/pool_hba.conf" for client authentication configuration file
2018-10-01 18:01:01: pid 21: LOG:  Backend status file /var/log/postgresql/pgpool_status does not exist
2018-10-01 18:01:01: pid 21: DEBUG:  pool_coninfo_size: num_init_children (32) * max_pool (4) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 2228224 bytes requested for shared memory
2018-10-01 18:01:01: pid 21: DEBUG:  ProcessInfo: num_init_children (32) * sizeof(ProcessInfo) (32) = 1024 bytes requested for shared memory
2018-10-01 18:01:01: pid 21: DEBUG:  Request info are: sizeof(POOL_REQUEST_INFO) 5304 bytes requested for shared memory
2018-10-01 18:01:01: pid 21: DEBUG:  Recovery management area: sizeof(int) 4 bytes requested for shared memory
2018-10-01 18:01:01: pid 21: LOG:  waiting for watchdog to initialize
2018-10-01 18:01:01: pid 22: LOG:  setting the local watchdog node name to "172.31.0.9:5432 Linux b251a3a36756"
2018-10-01 18:01:01: pid 22: LOG:  watchdog cluster is configured with 1 remote nodes
2018-10-01 18:01:01: pid 22: LOG:  watchdog remote node:0 on pgpool1:9000
2018-10-01 18:01:01: pid 22: DEBUG:  adding monitoring interface [0] name eth0 index 1952
2018-10-01 18:01:01: pid 22: INFO:  IPC socket path: "/var/run/pgpool/.s.PGPOOLWD_CMD.9000"
2018-10-01 18:01:01: pid 22: DEBUG:  network interface lo having flags 65609
2018-10-01 18:01:01: pid 22: DEBUG:  network interface tunl0 having flags 128
2018-10-01 18:01:01: pid 22: DEBUG:  network interface ip6tnl0 having flags 128
2018-10-01 18:01:01: pid 22: DEBUG:  network interface eth0 having flags 69699
2018-10-01 18:01:01: pid 22: DEBUG:  network interface "eth0" link is active
2018-10-01 18:01:01: pid 22: DEBUG:  network interface "eth0" link is up
2018-10-01 18:01:01: pid 22: DEBUG:  network interface lo having flags 65609
2018-10-01 18:01:01: pid 22: DEBUG:  network interface eth0 having flags 69699
2018-10-01 18:01:01: pid 22: DEBUG:  network interface "eth0" link is active
2018-10-01 18:01:01: pid 22: DEBUG:  network interface "eth0" link is up
2018-10-01 18:01:01: pid 22: DEBUG:  network interface "eth0" is up and we can continue
2018-10-01 18:01:01: pid 22: LOG:  watchdog node state changed from [DEAD] to [LOADING]
2018-10-01 18:01:01: pid 22: DEBUG:  STATE MACHINE INVOKED WITH EVENT = STATE CHANGED Current State = LOADING
2018-10-01 18:01:01: pid 22: LOG:  new outbound connection to pgpool1:9000
2018-10-01 18:01:01: pid 22: DEBUG:  STATE MACHINE INVOKED WITH EVENT = NEW OUTBOUND_CONNECTION Current State = LOADING
2018-10-01 18:01:01: pid 22: DEBUG:  sending packet, watchdog node:[] command id:[4] type:[ADD NODE] state:[LOADING]
2018-10-01 18:01:01: pid 22: DEBUG:  sending watchdog packet to socket:7, type:[A], command ID:4, data Length:242
2018-10-01 18:01:01: pid 22: DEBUG:  received watchdog packet type:R
2018-10-01 18:01:01: pid 22: DEBUG:  reading packet type R of length 0
2018-10-01 18:01:01: pid 22: DEBUG:  STATE MACHINE INVOKED WITH EVENT = PACKET RECEIVED Current State = LOADING
2018-10-01 18:01:01: pid 22: DEBUG:  received packet, watchdog node:[] command id:[4] type:[REJECT] state:[LOADING]
2018-10-01 18:01:01: pid 22: FATAL:  Add to watchdog cluster request is rejected by node "pgpool1:9000"
2018-10-01 18:01:01: pid 22: HINT:  check the watchdog configurations.
2018-10-01 18:01:01: pid 22: LOG:  Watchdog is shutting down
2018-10-01 18:01:01: pid 21: DEBUG:  reaper handler
2018-10-01 18:01:01: pid 21: DEBUG:  watchdog child process with pid: 22 exit with FATAL ERROR. pgpool-II will be shutdown
2018-10-01 18:01:01: pid 21: LOG:  watchdog child process with pid: 22 exits with status 768
2018-10-01 18:01:01: pid 21: FATAL:  watchdog child process exit with fatal error. exiting pgpool-II
2018-10-01 18:01:01: pid 23: LOG:  setting the local watchdog node name to "172.31.0.9:5432 Linux b251a3a36756"
2018-10-01 18:01:01: pid 23: LOG:  watchdog cluster is configured with 1 remote nodes
2018-10-01 18:01:01: pid 23: LOG:  watchdog remote node:0 on pgpool1:9000
2018-10-01 18:01:01: pid 23: DEBUG:  adding monitoring interface [0] name eth0 index 1952
2018-10-01 18:01:01: pid 23: INFO:  IPC socket path: "/var/run/pgpool/.s.PGPOOLWD_CMD.9000"
2018-10-01 18:01:01: pid 23: DEBUG:  network interface lo having flags 65609
2018-10-01 18:01:01: pid 23: DEBUG:  network interface tunl0 having flags 128
2018-10-01 18:01:01: pid 23: DEBUG:  network interface ip6tnl0 having flags 128
2018-10-01 18:01:01: pid 23: DEBUG:  network interface eth0 having flags 69699
2018-10-01 18:01:01: pid 23: DEBUG:  network interface "eth0" link is active
2018-10-01 18:01:01: pid 23: DEBUG:  network interface "eth0" link is up
2018-10-01 18:01:01: pid 23: DEBUG:  network interface lo having flags 65609
2018-10-01 18:01:01: pid 23: DEBUG:  network interface eth0 having flags 69699
2018-10-01 18:01:01: pid 23: DEBUG:  network interface "eth0" link is active
2018-10-01 18:01:01: pid 23: DEBUG:  network interface "eth0" link is up
2018-10-01 18:01:01: pid 23: DEBUG:  network interface "eth0" is up and we can continue

pgpool-2 exits at the point of that final log entry.

What I've tried, what I've confirmed

  • tried running pgpool as root, tried running pgpool as postgres
  • tried wd_authkey, but not SSL
  • tried running with docker privileged: true
  • tried explicitly defining ports: instead of just exposing them, so the docker networking wouldn't have a need to map a random port to the specified exposed port
  • tried with a wildcard listen_address, tried with the host ip as the listen_address (this ip is derived from within the docker container, so I'm not sure that would work
  • tried with and without the wd_heartbeat settings defined
  • diffed the pgpool.conf for both processes, the only settings that were different were the other_wd_hosts, wd_hostname, and listen_address if I had it set to non-wildcard

At this point I've stepped away a few times, and revisited with a clear head, I still can't figure out why I'm seeing this behavior. Anything I'm missing here that could case the connection to be rejected by the master watchdog?

bluescores
  • 4,437
  • 1
  • 20
  • 34

1 Answers1

2

Oh boy, just minutes after posting this I was looking through the output I used to double check I didn't leave any server names, account numbers, or anything like that in there. I figured out the problem.

tl;dr: wd_hostname and other_pgpool_hostname settings have to match up between different pgpool nodes. other_pgpool_hostname is basically a whitelisting setting.

I noticed this line:

2018-10-01 18:01:01: pid 23: LOG: setting the local watchdog node name to "172.31.0.9:5432 Linux b251a3a36756"

And I wondered if having the wd_hostname set to the IP address was a problem, because the other_pgpool_hostname settings weren't being set to the IPs, they were being set to the docker-compose.yml service names (which is correct).

I changed the configurations to set the wd_hostname to match the docker-compose.yml service name of a given container, and magically everything connected.

bluescores
  • 4,437
  • 1
  • 20
  • 34