0

I have an issue with my redis sentinel setup, which has 4 nodes (1 master, 3 slaves). I've patched first slave node (docker version changed from 17.03.1-ce to 17.12.0-ce). My problem is that master does not anymore take slave node1 to the members pool.

Slave (node1) info (It recognized master node):

$ docker exec -it redis-sentinel redis-cli info replication
 # Replication
 role:slave
 master_host:<master_ip>
 master_port:6379
 master_link_status:down

Master info:

$ docker exec -it redis-sentinel redis-cli info replication
# Replication
role:master
connected_slaves:2    
slave0:ip=<slave_2_ip>,port=6379,state=online,offset=191580670534,lag=0   
slave1:ip=<slave_3_ip>,port=6379,state=online,offset=191580666435,lag=0
master_repl_offset:191580672343

Master must have 3 slaves. Master IP is correct on node1 (what was patched). node 2,3,4 docker versions are 17.03.1-ce. When I tested the same situation in development - all works fine. Can you suggest something, what I need to do, to enable replication between master and slave node1?

After docker restart (@node1) I see something like that (msg="unknown container"):

Jan 31 08:16:12 node1 dockerd[17288]: time="2018-01-31T08:16:12.150892519+02:00" level=warning msg="unknown container" container=23e48b7846bd325ba5af772217085b60708660f5f5d8bb6fefd23094235ac01f module=libcontainerd namespace=plugins.moby
Jan 31 08:16:12 node1 dockerd[17288]: time="2018-01-31T08:16:12.177513187+02:00" level=warning msg="unknown container" container=23e48b7846bd325ba5af772217085b60708660f5f5d8bb6fefd23094235ac01f module=libcontainerd namespace=plugins.moby

When I examine node4 master logs I see that node1 was converted to slave:

1:X 30 Jan 21:35:09.301 # +sdown sentinel 66f6a8950a72952ac7df18f6a653718445fad5db node1_slave 26379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:35:10.276 # +sdown slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:10.388 * +reboot slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:10.473 # -sdown slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:10.473 # -sdown sentinel 66f6a8950a72952ac7df18f6a653718445fad5db node1_slave 26379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:20.436 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:30.516 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:40.529 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 22:39:48.284 * +reboot slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 22:39:58.391 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 22:40:08.447 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379

On the other hand redis-client logs are showing me that it cannot save DB on disk.

$ docker logs --follow redis-client
1:M 31 Jan 07:47:09.451 * Slave node3_slave:6379 asks for synchronization
1:M 31 Jan 07:47:09.451 * Full resync requested by slave node3_slave:6379
1:M 31 Jan 07:47:09.451 * Starting BGSAVE for SYNC with target: disk
1:M 31 Jan 07:47:09.452 # Can't save in background: fork: Out of memory
1:M 31 Jan 07:47:09.452 # BGSAVE for replication failed
1:M 31 Jan 07:47:24.628 * Slave node1_slave:6379 asks for synchronization
1:M 31 Jan 07:47:24.628 * Full resync requested by slave node1_slave:6379
1:M 31 Jan 07:47:24.628 * Starting BGSAVE for SYNC with target: disk
1:M 31 Jan 07:47:24.628 # Can't save in background: fork: Out of memory
1:M 31 Jan 07:47:24.628 # BGSAVE for replication failed
1:M 31 Jan 07:48:10.560 * Slave node3_slave:6379 asks for synchronization
1:M 31 Jan 07:48:10.560 * Full resync requested by slave node3_slave:6379
1:M 31 Jan 07:48:10.560 * Starting BGSAVE for SYNC with target: disk
sergei
  • 402
  • 1
  • 5
  • 14

1 Answers1

0

Problem solved by switching vm.overcommit_memory to 1.

sysctl vm.overcommit_memory=1

Thanks to yanhan comment

The log is now somethig like that:

1:M 31 Jan 07:48:10.560 * Slave node2_slave:6379 asks for synchronization
1:M 31 Jan 07:48:10.560 * Full resync requested by slave node2_slave:6379
1:M 31 Jan 07:48:10.560 * Starting BGSAVE for SYNC with target: disk
1:M 31 Jan 07:48:10.569 * Background saving started by pid 16
1:M 31 Jan 07:49:15.773 # Connection with slave client id #388090 lost.
1:M 31 Jan 07:49:16.219 # Connection with slave node2_slave:6379 lost.
1:M 31 Jan 07:49:25.394 * Slave node1_slave:6379 asks for synchronization
1:M 31 Jan 07:49:25.395 * Full resync requested by slave node1_slave:6379
1:M 31 Jan 07:49:25.395 * Can't attach the slave to the current BGSAVE. Waiting for next BGSAVE for SYNC
1:S 31 Jan 07:49:35.421 # Connection with slave node1_slave:6379 lost.
1:S 31 Jan 07:49:35.518 * SLAVE OF node2_slave:6379 enabled (user request from 'id=395598 addr=node2_slave:33026 fd=7 name=sentinel-52caa67d-cmd age=10 idle=0 flags=x db=0     sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 31 Jan 07:49:36.121 * Connecting to MASTER node2_slave:6379
1:S 31 Jan 07:49:36.122 * MASTER <-> SLAVE sync started
1:S 31 Jan 07:49:36.135 * Non blocking connect for SYNC fired the event.
1:S 31 Jan 07:49:36.138 * Master replied to PING, replication can continue...
1:S 31 Jan 07:49:36.147 * Partial resynchronization not possible (no cached master)
1:S 31 Jan 07:49:36.153 * Full resync from master: f15e28b26604bda49ad515b38cba2639ee8e13bc:191935552685
1:S 31 Jan 07:49:46.523 * MASTER <-> SLAVE sync: receiving 1351833877 bytes from master
1:S 31 Jan 07:49:57.888 * MASTER <-> SLAVE sync: Flushing old data
16:C 31 Jan 07:50:17.083 * DB saved on disk
16:C 31 Jan 07:50:17.114 * RDB: 3465 MB of memory used by copy-on-write
1:S 31 Jan 07:51:22.749 * MASTER <-> SLAVE sync: Loading DB in memory
1:S 31 Jan 07:51:46.609 * MASTER <-> SLAVE sync: Finished with success
1:S 31 Jan 07:51:46.609 * Background saving terminated with success
sergei
  • 402
  • 1
  • 5
  • 14