I have a two node PostgreSQL cluster configured through repmgr
.
The database topology looks like this:
db1 - 10.10.10.50 ( master )
db2 - 10.10.10.60 ( standby )
wit - 10.10.10.70 ( witness )
The creation of cluster ( as the replication and automatic failover ) work as expected, but the problem is the following.
Let's say that in my cluster the db1
node goes down, then the expected behaviour is that the db2
node gets promoted to a new master. That is all good and the logs prove it:
[WARNING] connection to upstream has been lost, trying to recover... 60 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 50 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 40 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 30 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 20 seconds before failover decision
[WARNING] connection to upstream has been lost, trying to recover... 10 seconds before failover decision
[ERROR] unable to reconnect to upstream after 60 seconds...
[ERROR] connection to database failed: could not connect to server: No route to host
Is the server running on host "10.10.10.50" and accepting
TCP/IP connections on port 5432?
[ERROR] connection to database failed: could not connect to server: No route to host
Is the server running on host "10.10.10.50" and accepting
TCP/IP connections on port 5432?
[NOTICE] promoting standby
[NOTICE] promoting server using '/usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/9.3/main promote'
[NOTICE] STANDBY PROMOTE successful. You should REINDEX any hash indexes you have.
The db2
node is now promoted to a new master, and it's all good, until the db1
node gets back up.
In that scenario, it's expected that the db1
to became a new standby, but that is not the case as I end up with both nodes acting as master ?!
So my question is, after a failover, how can I prevent that both nodes act as a master ( in the docs it says to include a third node to be a witness - I have that ), but the desired effect is not there.
Here is an example of my repmgr.conf file:
cluster=test_cluster
node=1
node_name=db1
conninfo='host=10.10.10.50 dbname=repmgr user=repmgr'
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='repmgr standby promote -f /etc/repmgr/repmgr.conf'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
pg_bindir=/usr/lib/postgresql/9.3/bin
And the cluster state after the db1
node gets back up:
repmgr -f /etc/repmgr/repmgr.conf cluster show
Role | Connection String
* master | host=10.10.10.50 dbname=repmgr user=repmgr
* master | host=10.10.10.60 dbname=repmgr user=repmgr
witness | host=10.10.10.70 dbname=repmgr user=repmgr port=5499
Thanks a bunch,
Best regards