Setup
1 redis master
3 clients redis slaves(for reads) with tunnels to master for writes
Each redis instance is a different server in a different location.
Requirement
Keep a persistent connection. When a slave disconnects be able to reconnect and resync with master as quickly as possible (< 1 minute).
sshd on master
TCPKeepAlive no
autossh configuration file
REDIS_SLAVE_PORT=6379 REDIS_MASTER_PORT=6379
AUTOSSH_FIRST_POLL=5 AUTOSSH_POLL=11
AUTOSSH_PORT=20000
AUTOSSH_GATETIME=10
AUTOSSH_LOGFILE=/home/xxx/autossh.log
AUTOSSH_PATH=/usr/bin/ssh
AUTOSSH_PIDFILE=/home/xxx/autossh.pid
AUTOSSH_LOGLEVEL=7 # FOR DEBUG
export AUTOSSH_POLL AUTOSSH_LOGFILE AUTOSSH_PATH AUTOSSH_GATETIME AUTOSSH_PORT AUTOSSH_PIDFILE AUTOSSH_FIRST_POLL AUTOSSH_LOGLEVEL
autossh -2 -fN -M ${AUTOSSH_PORT} -C -L ${REDIS_SLAVE_PORT}:localhost:${REDIS_MASTER_PORT} -i /home/xxx/.ssh/id_rsa user@master_ip
each slave has own AUTOSSH_PORT 20000, 20002 20004 etc
Problem
Usually when the connection between a slave and the master dies its re-established pretty quickly - autossh log:
autossh[pid]: timeout polling to accept read connection
autossh[pid]: port down, restarting ssh
autossh[pid]: checking for grace period, tries = 0
autossh[pid]: starting ssh (count x)
autossh[pid]: ssh child pid is xxx
autossh[pid]: check on child xxx
autossh[pid]: set alarm for 5 secs
autossh[pid]: execing /usr/bin/ssh
autossh[pid]: connection ok
However, sometimes a different kind of disconnect seems to happen which won't reconnect even after 10 minutes of trying:
autossh[pid]: 127.0.0.1:20000: Connection refused
autossh[pid]: port down, restarting ssh
Only way I found currently to make it reconnect is to change the monitor port in the autossh configuration file by hand. This is not a good solution unless I can somehow automate it and free the port.
I tried to use inetd echo service on port 7 to find a workaround for monitor port(-M 20000:7) being in use, but for some reason that didn't work here is my autossh log from the attempt:
autossh[pid]: not what I sent: "ubuntu autossh 10670 1817574984 " : ""
and after that autossh dies.