0

I have created a Redis Cluster as bellow.

xxx.xxx.xxx.195:9100    xxx.xxx.xxx.196:9100    xxx.xxx.xxx.197:9100
xxx.xxx.xxx.195:9200    xxx.xxx.xxx.196:9200    xxx.xxx.xxx.197:9200

I have experienced that the CLUSTER is FAILED to recover when I stop 2 Redis instances which are masters of the cluster as (2 instances of xxx.xxx.xxx.196) at the same time,

xxx.xxx.xxx.195:9100 (Master)    xxx.xxx.xxx.196:9100 (Master)   xxx.xxx.xxx.197:9100 (Slave)
xxx.xxx.xxx.195:9200 (Slave)     xxx.xxx.xxx.196:9200 (Master)   xxx.xxx.xxx.197:9200 (Slave)

But at the same time if I stop 2 instances of .195 server where 9100 -Master and 9200 - Slave. Cluster recovers and works fine

Cluster Configuration File :

protected-mode no
activerehashing yes
cluster-enabled yes
cluster-config-file /opt/redis/conf/nodes9100.conf
cluster-slave-validity-factor 0
cluster-node-timeout 5000
appendonly yes

Redis Logs in dedicated slave server :

28939:S 09 Oct 16:08:32.834 - 0 clients connected (0 slaves), 1327200 bytes in use
28939:S 09 Oct 16:08:32.834 * Connecting to MASTER xxx.xxx.xxx.196:9200
28939:S 09 Oct 16:08:32.835 * MASTER <-> SLAVE sync started
28939:S 09 Oct 16:08:32.835 # Error condition on socket for SYNC: Connection refused
28939:S 09 Oct 16:08:33.837 * Connecting to MASTER xxx.xxx.xxx.196:9200
28939:S 09 Oct 16:08:33.837 * MASTER <-> SLAVE sync started
28939:S 09 Oct 16:08:33.837 # Error condition on socket for SYNC: Connection refused
28939:S 09 Oct 16:08:34.839 * Connecting to MASTER xxx.xxx.xxx.196:9200
28939:S 09 Oct 16:08:34.839 * MASTER <-> SLAVE sync started
28939:S 09 Oct 16:08:34.839 # Error condition on socket for SYNC: Connection refused
28939:S 09 Oct 16:08:35.840 * Connecting to MASTER xxx.xxx.xxx.196:9200
28939:S 09 Oct 16:08:35.840 * MASTER <-> SLAVE sync started
28939:S 09 Oct 16:08:35.840 # Error condition on socket for SYNC: Connection refused
28939:S 09 Oct 16:08:36.744 - Node 982d9b0a50b393d5fe604caefc0acaae68547648 reported node b57d59fb5685daeaac7e249d99fa257e9be66f4f as not reachable.
28939:S 09 Oct 16:08:36.844 * Connecting to MASTER xxx.xxx.xxx.196:9200
28939:S 09 Oct 16:08:36.844 * MASTER <-> SLAVE sync started
28939:S 09 Oct 16:08:36.844 # Error condition on socket for SYNC: Connection refused
tiroshanm
  • 123
  • 3
  • 13

1 Answers1

0

Found the issue :
src : https://redis.io/topics/cluster-tutorial Redis Cluster also provides some degree of availability during partitions, that is in practical terms the ability to continue the operations when some nodes fail or are not able to communicate. However the cluster stops to operate in the event of larger failures (for example when the majority of masters are unavailable).

tiroshanm
  • 123
  • 3
  • 13