0

WE have a redis cluster with 4 master and 4 slave. The 4 master are on one physical host and the slaves are on other physical host. WE have observed a frequest auto failvoer happening between the master and slave even though the servers are up and running(suspecting the network glitch here). Once they are failed over the new master CPU is going high and is throwing the redis Server exception (as attached screenshot) disconnecting the clients. Below are the cluster node details:

6adb459bc1cda0ae002109140d04015c531c6910 10.10.52.38:6379 slave 0060ee610b3a52bf88a0202aff0ce63039354578 0 1555648709383 58 connected
46f38129c861ff775badc67cc869493ee28fd166 10.10.52.44:6379 slave 19538764e5cde1014f1fd35afbf1af3a217de7b4 0 1555648708378 66 connected
1833427e42afa74273aa33696ed7e5f80f40e244 10.10.52.40:6379 master - 0 1555648706367 63 connected 6827-9556 15018-16383
a0f14e54f18e1a04c448cd09e851459863e929b0 10.10.52.42:6379 slave d190c341144350bf9dbad67841104ed75ccbdcdc 0 1555648710388 65 connected
0060ee610b3a52bf88a0202aff0ce63039354578 10.10.52.37:6379 master - 0 1555648704357 58 connected 2730-5460 13653-15017
b702bbcddb6a39e2deb8567804fa5d4468fbe5cc 10.10.52.39:6379 slave 1833427e42afa74273aa33696ed7e5f80f40e244 0 1555648708879 63 connected
d190c341144350bf9dbad67841104ed75ccbdcdc 10.10.52.41:6379 master - 0 1555648710385 65 connected 9557-13652
19538764e5cde1014f1fd35afbf1af3a217de7b4 10.10.52.43:6379 myself,master - 0 0 66 connected 0-2729 5461-6826

Also the Cluster config details:

1) "cluster-node-timeout"
2) "15000"
3) "cluster-migration-barrier"
4) "1"
5) "cluster-slave-validity-factor"
6) "10"
7) "cluster-require-full-coverage"
8) "yes"

Attached is the log from one of the slave: https://pastebin.com/GS8ChyeH

Is the configuration correct? How do we prevent this from happening? enter image description here

Parth Gandhi
  • 311
  • 8
  • 16
  • Here is the commandstats output from 2 nodes : 1st original master and 2nd from slave that auto failed over to master and created issue: https://pastebin.com/aSej3y5s usec_per_call value for each command is high in the 2nd master - does it mean it is taking more commands and taking more time to process a command? – Parth Gandhi Apr 19 '19 at 05:04
  • Try to monitor the trace logs of redis master/slave. You can get delay in Ping/Pong between nodes to monitor how much slaves are taking to talk with each other. Also post some logs from master at the time of failing. – Anuj Vishwakarma Apr 26 '19 at 07:00

0 Answers0