Its a sub-part of cluster , I am building. When I am executing pcp_recovery_node on master to build standby from scratch with the command
pcp_recovery_node -h 193.185.83.119 -p 9898 -U postgres -n 1
Here, 193.185.83.119 is vip. It successfully builds and starts the standby on node-b ( say nodes are node-a and node-b) but at the same time above command does not return and simply hangs in shell like :-
[postgres@rollc-filesrvr1 data]$ pcp_recovery_node -h 193.185.83.119 -p 9898 -U postgres -n 1 Password:
I have to use ctrl+c to come out of this session. Later on when I try to create a test db on node-a (master) i get following error :
postgres=# create database test; ERROR: source database "template1" is being accessed by other users DETAIL: There is 1 other session using the database.
I confirm that pgpool.service is running at the time of running this command on node-a and i have tried using on/off pgpool.service on node-b (standby) before issuing pcp command. Result remains the same.
Also I tried googling and tweaked following settings in pgpool.conf . I am not sure if it could be something with these parameters:
wd_lifecheck_dbname in pgpool.conf
Initially related settings to above were ( and i was getting still same result):
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''
later on , i found different settings at here, here and one suggestion at here and tried different combinations like following :
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'postgres'
wd_lifecheck_password = ''
or
wd_lifecheck_dbname = 'postgres'
wd_lifecheck_user = 'postgres'
wd_lifecheck_password = ''
But none helped in changing the situation neither on shell nor allowed me to create test db on master. I feel , i reached a dead end.
I am still not able to fully understand the purpose and meaning of above 3 parameters in pgpool and somehow suspect that these are the ones I am not configuration correct although there could be others reasons also.
just to help , here is the environment details again.
- node-a and nod-b environment : rhel 7.6
- postgres version : 10.7
- pgpool-|| version: 4.0.3
- replication slot + wal archive
Here are the logs from node-a pgpool.service
Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 16642: LOG: forked new pcp worker, pid=8534 socket=7
Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: LOG: starting recovering node 1
Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: LOG: executing recovery
Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: DETAIL: starting recovery command: "SELECT pgpool_recovery('recovery_1st_stage', 'node-a-ip', '/data/test/data', '5438', 1)"
Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: LOG: executing recovery
Mar 18 21:10:17 node-a pgpool[16583]: 2019-03-18 21:10:17: pid 8534: DETAIL: disabling statement_timeout
Mar 18 21:10:18 node-a pgpool[16583]: 2019-03-18 21:10:18: pid 8534: LOG: node recovery, 1st stage is done
Mar 18 21:11:37 node-a pgpool[16583]: 2019-03-18 21:11:37: pid 8534: LOG: checking if postmaster is started
Mar 18 21:11:37 node-a pgpool[16583]: 2019-03-18 21:11:37: pid 8534: DETAIL: trying to connect to postmaster on hostname:node-b-ip database:postgres user:postgres (retry 0 times)
...
...2 more times
Mar 18 21:11:49 node-a pgpool[16583]: 2019-03-18 21:11:49: pid 8534: LOG: checking if postmaster is started
Mar 18 21:11:49 node-a pgpool[16583]: 2019-03-18 21:11:49: pid 8534: DETAIL: trying to connect to postmaster on hostname:node-a-ip database:template1 user:postgres (retry 0 times)
...it keeps on trying till i press ctrl+c on pcp command windows . I have seen it going upto 30 or more.
Also node-b is never being shown as up while checking with pgpool.
postgres=> show pool_nodes; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change ---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------+--------------------- 0 | node-a-ip | 5438 | up | 0.500000 | primary | 0 | true | 0 | 2019-03-18 22:59:19 1 | node-b-ip | 5438 | down | 0.500000 | standby | 0 | false | 0 | 2019-03-18 22:59:19 (2 rows)
EDIT Now i am at least being able to correct the last part of this query. i.e. adding the standby node to cluster:
[postgres@node-a-hostname]$ pcp_attach_node -n 1 Password: pcp_attach_node -- Command Successful
and now the last part is at least showing correct situation:
postgres=> show pool_nodes; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | last_status_change ---------+----------------+------+--------+-----------+---------+------------+-------------------+-------------------+--------------------- 0 | node-a-ip | 5438 | up | 0.500000 | primary | 0 | false | 0 | 2019-03-18 22:59:19 1 | node-b-ip | 5438 | up | 0.500000 | standby | 0 | true | 0 | 2019-03-19 11:38:38 (2 rows)
But underlying problem of not able to create db on node1 is still there:
EDIT2: I tried insert and update on master and they are correctly being replicated to node2 but create db is still not working.