0

I've got the following configuration:

  • Redis_version:3.2.0
  • 3 master nodes and 3 slave nodes

Each master node is replicated to a slave Everything is correct. When one master node fails by a "kill" command, the corresponding slave node becomes the master as expected. After few seconds, cluster_state returns to the OK state.

BUT, if two master nodes fail simultaneously, none of the associated slave nodes become the master. The cluster_state stays in "fail" state.

cluster nodes command output.
b60c284a515b31aa6b11022fc07cf1a399171e04 127.0.0.1:7000 master,fail? - 1464690455030 1464690454930 1 disconnected 0-5460
637d1f074419963653b206c5ed7cbed4c3d0ace0 127.0.0.1:7001 master,fail? - 1464690455030 1464690454930 2 disconnected 5461-10922
d2aae2a3d87c6407e002076740c8febf80f37865 127.0.0.1:7003 myself,slave b60c284a515b31aa6b11022fc07cf1a399171e04 0 0 4 connected
72d4c9ce140fb57436c1b21702bf3c646ef29db3 127.0.0.1:7002 master - 0 1464690718480 3 connected 10923-16383
af34a7b2241943baf23e634e81b552d8bf23cdd0 127.0.0.1:7005 slave 72d4c9ce140fb57436c1b21702bf3c646ef29db3 0 1464690718480 6 connected
d0fec0609c9e786ac9ca4629f36cabd7c5c3130c 127.0.0.1:7004 slave 637d1f074419963653b206c5ed7cbed4c3d0ace0 0 1464690718480 5 connected
halfer
  • 19,824
  • 17
  • 99
  • 186
chicharito
  • 1,047
  • 3
  • 12
  • 41

1 Answers1

0

The slave auto-failover won't happen when at least half of the masters get disconnected, because the failover election is required more than half of the masters come into consensus.

To start a manual failover, connect to the slave node with redis-cli and send a cluster failover TAKEOVER command (the takeover is required).

In your case

redis-cli -h 127.0.0.1 -p 7003 cluster failover takeover

After the :7003 becomes a master, the other slave will start an automatic failover as well since there are more than half (2/3) of the masters are alive.

neuront
  • 9,312
  • 5
  • 42
  • 71
  • How would you make sure that after the failed Master node comes back online that it becomes the Master node again, and thus the temp failover Master becomes slave again. – Niko Dierickx Aug 16 '21 at 12:24