0

I've installed http://rediscart-claytondev.rhcloud.com/build/manifest/redis-2.8 cartridge and scaled it to 3 gears. The REDIS_SENTINEL_QUORUM is set to 2 on each. The sentinels are starting up OK after I've changed the ~/redis/bin/control from:

erb conf/redis-sentinel.conf.erb | redis-server conf - --sentinel

to:

erb conf/redis-sentinel.conf.erb > conf/redis-sentinel.conf
redis-server conf/redis-sentinel.conf --sentinel

Now, after restarting the cartridge it looks allright until I kill the master. The slaves just sit there counting seconds since they last saw it... Their (one of them) logs say:

[42612] 25 Feb 14:49:36.548 # Sentinel runid is 88269647396c4fcd07e8a1e6030eb01a7b8adcb3
[42612] 25 Feb 14:49:36.548 # +monitor master 54edcc79ca2895e4a300021f 127.1.1.45 38846 quorum 2
[42612] 25 Feb 14:49:36.548 # +monitor master 54edac43ca2895e4a30001e5 127.1.1.45 38961 quorum 2
[42612] 25 Feb 14:49:36.548 # +monitor master 54edac2cca2895e4a30001cd 127.1.1.46 38821 quorum 2
[42612] 25 Feb 14:49:36.550 * +slave slave 127.1.1.45:16379 127.1.1.45 16379 @ 54edac2cca2895e4a30001cd 127.1.1.46 38821
[42605] 25 Feb 14:49:36.603 * MASTER <-> SLAVE sync: receiving 18 bytes from master
[42605] 25 Feb 14:49:36.603 * MASTER <-> SLAVE sync: Flushing old data
[42605] 25 Feb 14:49:36.603 * MASTER <-> SLAVE sync: Loading DB in memory
[42605] 25 Feb 14:49:36.603 * MASTER <-> SLAVE sync: Finished with success
[42612] 25 Feb 14:49:37.700 * +sentinel sentinel 127.1.1.45:26379 127.1.1.45 26379 @ 54edac2cca2895e4a30001cd 127.1.1.46 38821
[42612] 25 Feb 14:49:37.748 * +sentinel sentinel 127.1.1.46:26379 127.1.1.46 26379 @ 54edac2cca2895e4a30001cd 127.1.1.46 38821
[42612] 25 Feb 14:49:46.632 # +sdown slave 127.1.1.45:16379 127.1.1.45 16379 @ 54edac2cca2895e4a30001cd 127.1.1.46 38821
[42612] 25 Feb 14:49:47.735 # +sdown sentinel 127.1.1.45:26379 127.1.1.45 26379 @ 54edac2cca2895e4a30001cd 127.1.1.46 38821
[42612] 25 Feb 14:49:47.793 # +sdown sentinel 127.1.1.46:26379 127.1.1.46 26379 @ 54edac2cca2895e4a30001cd 127.1.1.46 38821
[42612] 25 Feb 14:50:06.596 # +sdown master 54edcc79ca2895e4a300021f 127.1.1.45 38846
[42612] 25 Feb 14:50:06.596 # +sdown master 54edac43ca2895e4a30001e5 127.1.1.45 38961
[42605] 25 Feb 14:51:17.914 # Connection with master lost.
[42605] 25 Feb 14:51:17.914 * Caching the disconnected master state.
[42605] 25 Feb 14:51:18.649 * Connecting to MASTER 54edac2cca2895e4a30001cd-redis.ose.dr.myriadpayments.co.uk:38821
[42605] 25 Feb 14:51:18.650 * MASTER <-> SLAVE sync started
[42605] 25 Feb 14:51:18.650 # Error condition on socket for SYNC: Connection refused

UPDATE:

As requested, including the configs (all comments removed):

Done with:
erb redis.conf.erb | grep -vE "(^[#]|^$)" > redis.conf && erb redis-sentinel.conf.erb | grep -vE "(^[#]|^$)" > redis-sentinel.conf

REDIS 54edcc79ca2895e4a300021f

daemonize yes
pidfile /var/lib/openshift/54edcc79ca2895e4a300021f/redis//pid/redis.pid
port 16379
bind 127.2.69.129
timeout 0
tcp-keepalive 0
loglevel notice
logfile /var/lib/openshift/54edcc79ca2895e4a300021f/redis//logs/redis.log
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/openshift/54edcc79ca2895e4a300021f/app-root/data//.redis/dbs/
slaveof 54edac2cca2895e4a30001cd-redis.openshift.zu 38821
masterauth ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
slave-serve-stale-data yes
slave-read-only yes
repl-disable-tcp-nodelay no
slave-priority 100
requirepass ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

REDIS 54edac2cca2895e4a30001cd

daemonize yes
pidfile /var/lib/openshift/54edac2cca2895e4a30001cd/redis//pid/redis.pid
port 16379
bind 127.2.67.1
timeout 0
tcp-keepalive 0
loglevel notice
logfile /var/lib/openshift/54edac2cca2895e4a30001cd/redis//logs/redis.log
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/openshift/54edac2cca2895e4a30001cd/app-root/data//.redis/dbs/
slave-serve-stale-data yes
slave-read-only yes
repl-disable-tcp-nodelay no
slave-priority 100
requirepass ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

REDIS 54edac43ca2895e4a30001e5

daemonize yes
pidfile /var/lib/openshift/54edac43ca2895e4a30001e5/redis//pid/redis.pid
port 16379
bind 127.2.81.1
timeout 0
tcp-keepalive 0
loglevel notice
logfile /var/lib/openshift/54edac43ca2895e4a30001e5/redis//logs/redis.log
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/openshift/54edac43ca2895e4a30001e5/app-root/data//.redis/dbs/
slaveof 54edac2cca2895e4a30001cd-redis.openshift.zu 38821
masterauth ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
slave-serve-stale-data yes
slave-read-only yes
repl-disable-tcp-nodelay no
slave-priority 100
requirepass ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

SENTINEL 54edcc79ca2895e4a300021f

pidfile /var/lib/openshift/54edcc79ca2895e4a300021f/redis//pid/redis-sentinel.pid
daemonize yes
logfile /var/lib/openshift/54edcc79ca2895e4a300021f/redis//logs/redis.log
bind 127.2.69.130
port 26379
sentinel monitor 54edac2cca2895e4a30001cd 54edac2cca2895e4a30001cd-redis.openshift.zu 38821 2
sentinel auth-pass 54edac2cca2895e4a30001cd ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
sentinel down-after-milliseconds 54edac2cca2895e4a30001cd 10000
sentinel parallel-syncs 54edac2cca2895e4a30001cd 1
sentinel failover-timeout 54edac2cca2895e4a30001cd 30000
sentinel monitor 54edac43ca2895e4a30001e5 54edac43ca2895e4a30001e5-redis.openshift.zu 38961 2
sentinel auth-pass 54edac43ca2895e4a30001e5 ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
sentinel down-after-milliseconds 54edac43ca2895e4a30001e5 10000
sentinel parallel-syncs 54edac43ca2895e4a30001e5 1
sentinel failover-timeout 54edac43ca2895e4a30001e5 30000
sentinel monitor 54edcc79ca2895e4a300021f 54edcc79ca2895e4a300021f-redis.openshift.zu 38846 2
sentinel auth-pass 54edcc79ca2895e4a300021f ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
sentinel down-after-milliseconds 54edcc79ca2895e4a300021f 10000
sentinel parallel-syncs 54edcc79ca2895e4a300021f 1
sentinel failover-timeout 54edcc79ca2895e4a300021f 30000

SENTINEL 54edac2cca2895e4a30001cd

pidfile /var/lib/openshift/54edac2cca2895e4a30001cd/redis//pid/redis-sentinel.pid
daemonize yes
logfile /var/lib/openshift/54edac2cca2895e4a30001cd/redis//logs/redis.log
bind 127.2.67.2
port 26379
sentinel monitor 54edac2cca2895e4a30001cd 54edac2cca2895e4a30001cd-redis.openshift.zu 38821 2
sentinel auth-pass 54edac2cca2895e4a30001cd ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
sentinel down-after-milliseconds 54edac2cca2895e4a30001cd 10000
sentinel parallel-syncs 54edac2cca2895e4a30001cd 1
sentinel failover-timeout 54edac2cca2895e4a30001cd 30000

SENTINEL 54edac43ca2895e4a30001e5

pidfile /var/lib/openshift/54edac43ca2895e4a30001e5/redis//pid/redis-sentinel.pid
daemonize yes
logfile /var/lib/openshift/54edac43ca2895e4a30001e5/redis//logs/redis.log
bind 127.2.81.2
port 26379
sentinel monitor 54edac2cca2895e4a30001cd 54edac2cca2895e4a30001cd-redis.openshift.zu 38821 2
sentinel auth-pass 54edac2cca2895e4a30001cd ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5
sentinel down-after-milliseconds 54edac2cca2895e4a30001cd 10000
sentinel parallel-syncs 54edac2cca2895e4a30001cd 1
sentinel failover-timeout 54edac2cca2895e4a30001cd 30000
ptrk
  • 1,800
  • 1
  • 15
  • 24
  • This looks like a combination of logs from one or more sentinels and one or more slaves. It would be much easier ipto help if you posted your sentinel configs as well as the output of info replication on a slave before and after you down a slave. – The Real Bill Feb 27 '15 at 06:33
  • At a glance it looks like you don't have sentinel configured properly. – The Real Bill Feb 27 '15 at 06:34
  • Thanks, that already gave me a hint to compact the configs a bit and compare them, but I'd still appreciate an experts' look. – ptrk Feb 27 '15 at 14:32

1 Answers1

0

Looking at your configs, you have sentinels and Redis instances using the same names. They are also sharing log files which causes confusion. This is not what you want.

You want:

3 sentinel instances with unique names.

One Redis master + One Redis slave (this is a "Pod" and you'll name it with something unique which identifies this combo - not a node in it).

You add that to your sentinel setup, and set it's password. Say you named the pod "pod1", then your config would look like this:

sentinel monitor pod1 <master-ip> <master-port> 2
sentinel auth-pass <the-master-auth-pass-and-requirepass-setting>
sentinel down-after-milliseconds pod1 10000
sentinel parallel-syncs pod1 1
sentinel failover-timeout pod1 30000

While you can run a sentinel constellation for each pod, it is more efficient to use one sentinel constellation to manage multiple pods. Also not you don't want these sentinels running in the same host as the Redis instances they monitor - to do otherwise risks losing them when you need them the most.

Unfortunately I am not familiar w/OpenShift so I don't know the route you take to have it configure this for you. However, knowing what the config should look like should help you confirm if it did it correctly.

The Real Bill
  • 14,884
  • 8
  • 37
  • 39