pgpool lost master after pcp_attach_node

Question

I have 3 Postgresql DB nodes (node0, node1, node2, version 9.4) and 1 pgpool node (version 3.7)

Step 1:

1.1) node0 is primary, node1 and node2 are standby.

1.2) "show pool_nodes" and "select * from pg_stat_replication" works as expected.

1.3) "insert" and "select" work as expected.

Step 2:

2.1) I manually stopped postgresql service in node0.

2.2) PGPool runs failover script and node1 becomes primary/node2 becomes standby.

2.3) "Show pool_nodes" shows that node 0 is down and "select * from pg_stat_replication" only shows one stand-by node which is node2. they are correct.

2.4) "insert" and "select" work as expected.

Step 3:

3.1) I manually start node0 as standby to node1.

3.2) "select * from pg_stat_replication" shows that both node0 and node2 are stand-by, which work as expected.

3.3) "show pool_nodes" shows node0 is still "down", which is expected.

3.4) "pcp_attach_node -n 0", which brings back node0. "show pool_nodes" shows node0 is in "up" status now, which is expected.

3.5) However, "insert" will be failed with message "ERROR: cannot execute INSERT in a read-only transaction".

3.6) "select * from pg_stat_replication" shows 0 rows, which means two stand-by nodes disappeared.

3.7) obviously, all "insert" and "select" goes to node0, not node1.

PGPool is running in master-slave streaming mode.

so, what did I miss in step 3.4 to bring back failed node?

Make sure all your failure modes work first. There's no point in recovery from a failure condition if the systems can't fail over properly in the first place. For example, what happens in step 2 if you pull the network cable(s) from `node0` instead of just manually stopping the `postgresql` service? What happens if you literally pull the plug on `node0` and `node2` simultaneously, simulating a power failure that got 2 of your 3 servers? I spent 6+ months a few years back trying to get Pgpool working for a customer... — Andrew Henle, Feb 26 '18 at 21:22
@AndrewHenle, thank you for your quick response. all failure modes work perfectly. I have tested different cases on node0, node1 and node2. right now I'm stuck in this simple recovery scenario. — xudesheng, Feb 26 '18 at 21:24
This one looks familiar: https://stackoverflow.com/questions/40000250/how-do-i-get-pcp-to-automatically-attach-nodes-to-postgres-pgpool — Andrew Henle, Feb 26 '18 at 21:27
@AndrewHenle yes, I leared pcp_attach_node from that post. "pcp_attach_node -U pgpool -h localhost -p 9898 -n 0" in my case brings node0 back, but pgpool will connect to this node0 which is stand-by now. it should connect to node1 (new master) in this case for "insert". — xudesheng, Feb 26 '18 at 21:29

score 0 · Answer 1 · answered Mar 06 '18 at 04:33

I have run into same problem as you do. However, my Pgpool version is 3.7.2. I find this link solve my problem:

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=e8baa3f0ee1a24f833cfa6cc5d0104b5cefe7b04

I have three postgresql database nodes in master/slave replication. My Pgpool works fine when node 0 is primary. After the primary node failover to node 1 and node 0 failback as standby, pgpool sends my write query to node 0, and get read-only transaction error.

I do another examination. I make a master/slave replication which node 0 is hot standby and node 1 is primary. I start the pgpool and try to create/insert into database. Pgpool always send my write query to node 0. After I download the latest source and build the Pgpool by myself, the problem is solved.

Thank you for your help. could you kindly please share with me the stops you have gone through? I want to verify this today. thanks. — xudesheng, Mar 09 '18 at 20:32
I think 3.7.2 is suffering from a bug that explains the behavior above, 3.7.2 and 3.7.3 both have a bug where failover from node 1 to node 0 is not working (a patched minor release will be delivered soon), so right now one should stick to 3.7.1 — Saule, May 26 '18 at 08:42

pgpool lost master after pcp_attach_node

Step 1:

Step 2:

Step 3:

1 Answers1