2

I have 3 Postgresql DB nodes (node0, node1, node2, version 9.4) and 1 pgpool node (version 3.7)

Step 1:

1.1) node0 is primary, node1 and node2 are standby.

1.2) "show pool_nodes" and "select * from pg_stat_replication" works as expected.

1.3) "insert" and "select" work as expected.

Step 2:

2.1) I manually stopped postgresql service in node0.

2.2) PGPool runs failover script and node1 becomes primary/node2 becomes standby.

2.3) "Show pool_nodes" shows that node 0 is down and "select * from pg_stat_replication" only shows one stand-by node which is node2. they are correct.

2.4) "insert" and "select" work as expected.

Step 3:

3.1) I manually start node0 as standby to node1.

3.2) "select * from pg_stat_replication" shows that both node0 and node2 are stand-by, which work as expected.

3.3) "show pool_nodes" shows node0 is still "down", which is expected.

3.4) "pcp_attach_node -n 0", which brings back node0. "show pool_nodes" shows node0 is in "up" status now, which is expected.

3.5) However, "insert" will be failed with message "ERROR: cannot execute INSERT in a read-only transaction".

3.6) "select * from pg_stat_replication" shows 0 rows, which means two stand-by nodes disappeared.

3.7) obviously, all "insert" and "select" goes to node0, not node1.

PGPool is running in master-slave streaming mode.

so, what did I miss in step 3.4 to bring back failed node?

xudesheng
  • 1,082
  • 11
  • 25
  • Make sure all your failure modes work first. There's no point in recovery from a failure condition if the systems can't fail over properly in the first place. For example, what happens in step 2 if you pull the network cable(s) from `node0` instead of just manually stopping the `postgresql` service? What happens if you literally pull the plug on `node0` and `node2` simultaneously, simulating a power failure that got 2 of your 3 servers? I spent 6+ months a few years back trying to get Pgpool working for a customer... – Andrew Henle Feb 26 '18 at 21:22
  • @AndrewHenle, thank you for your quick response. all failure modes work perfectly. I have tested different cases on node0, node1 and node2. right now I'm stuck in this simple recovery scenario. – xudesheng Feb 26 '18 at 21:24
  • This one looks familiar: https://stackoverflow.com/questions/40000250/how-do-i-get-pcp-to-automatically-attach-nodes-to-postgres-pgpool – Andrew Henle Feb 26 '18 at 21:27
  • @AndrewHenle yes, I leared pcp_attach_node from that post. "pcp_attach_node -U pgpool -h localhost -p 9898 -n 0" in my case brings node0 back, but pgpool will connect to this node0 which is stand-by now. it should connect to node1 (new master) in this case for "insert". – xudesheng Feb 26 '18 at 21:29

1 Answers1

0

I have run into same problem as you do. However, my Pgpool version is 3.7.2. I find this link solve my problem:

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=e8baa3f0ee1a24f833cfa6cc5d0104b5cefe7b04

I have three postgresql database nodes in master/slave replication. My Pgpool works fine when node 0 is primary. After the primary node failover to node 1 and node 0 failback as standby, pgpool sends my write query to node 0, and get read-only transaction error.

I do another examination. I make a master/slave replication which node 0 is hot standby and node 1 is primary. I start the pgpool and try to create/insert into database. Pgpool always send my write query to node 0. After I download the latest source and build the Pgpool by myself, the problem is solved.

  • Thank you for your help. could you kindly please share with me the stops you have gone through? I want to verify this today. thanks. – xudesheng Mar 09 '18 at 20:32
  • I think 3.7.2 is suffering from a bug that explains the behavior above, 3.7.2 and 3.7.3 both have a bug where failover from node 1 to node 0 is not working (a patched minor release will be delivered soon), so right now one should stick to 3.7.1 – Saule May 26 '18 at 08:42