3

I have 3 box setup of redis sentinel:

 CLIENT (connects to S1)
          |
          ↓
       +----+
       | M1 | us-east-1
       | S1 |
       +----+
          |
+----+    |    +----+
| R2 |----+----| R3 |
| S2 |         | S3 |
+----+         +----+
us-east-2      us-west-2

M1 - Master
S1 - Sentinel 1
S2 - Sentinel 2
S3 - Sentinel 3
R2 - First slave (R=replica)
R3 - Second slave

After my master died, sentinel made a failover to R2. I brought back M1 online (cleared some disk space) and now M1 is alive and well but a slave of R2. Is there an automatic way (or semi-automatic) to make M1 a master again and R2 as a slave of M1 and my traffic again using M1 as a master redis instance?

Essentially I want to revert to how it was prior to failover.

What currently happens is that it elects R2 as a master and reconfigures it to be:

CLIENT (connects to S1)
          |
          ↓
       +----+
       |[R2]| us-east-2
       | S2 |
       +----+
          |
+----+    |    +----+
|[M1]|----+----| R3 |
| S1 |         | S3 |
+----+         +----+
us-east-1      us-west-2

When I failover manually, it promotes R3 as master. (which is kind of expected).

But then when I failover again manually, it promotes R2, but I would expect it to promote M1.

All successive failovers rotate between R2 and R2 (while always keeping M1 as a slave of either).

My M1 slave priority is unspecified, so it means it's a default value of 100. My R2 slave priority is 200 and R2 is 300. That leads me to think that it should rotate all 3 of the boxes, but it rotates only R2 and R3 after the initial failover.

This looks like a sentinel bug to me

Valentin V
  • 24,971
  • 33
  • 103
  • 152
  • Did you ever find a solution? I have the same problem, but I tried stopping both the new Master and Slave to force it to the original master, but then sentinel says "failover-abort-no-good-slave master mymaster" I suspected it was a problem with quorum, so I tries killing the original master to see if killing the other master would also fail finding the one remaining slave as good, but it worked. so for some reason the original master is not seen as a good slave anymore. I suspect you might have the same situation. – Adriaan Jan 05 '19 at 16:49
  • @Adriaan I didn't find a solution to this yet, I just run the old setup. Whenever a failover occurs it picks a new master. I have a hope for it to work in Redis 5, but didn't get a chance to try it. – Valentin V Jan 06 '19 at 17:39
  • I am running Redis 5 in this setup where I'm having my problem of the original master not being elected. It's so strange. It must be a configuration problem, because it really shouldn't make a difference to Redis or Sentinel that the particular Redis instance was the master initially. I'll have to analyse the config files of all instances again... – Adriaan Jan 06 '19 at 18:21
  • I found my problem... as I expected it was a configuration problem. I setup each node with a password - and I tested the original setup and the slaves replicated nicely. But I never set the masterauth "PASSWRD" on the original master - so it didn't replicate correctly when it became a slave, which is why Sentinel (wisely) refused to promote it to master. I suggest you check that your original master is replicating correctly. – Adriaan Jan 07 '19 at 22:41

2 Answers2

3

I think kiddorails's answer is correct, but most probably you have a similar problem as I had, where for some reason your original master is not replicating correctly. Once I fixed my replication issue, I could cycle through my masters by issueing SENTINEL FAILOVER mymaster. Initially it would just bounce between the two original slaves, but now that my original master is correctly replicating, it is cycling through all 3. So I would recommend checking the replication of your original master after a failover. if you are sure it is working, you could also stop the other slave and then use the SENTINEL FAILOVER mymaster command to force a failover to the original master. If that fails, you know there must be an issue with the replication.

Adriaan
  • 144
  • 13
  • This is correct, by log files I figured that there was no replication because I missed `masterauth` on master: `9047:S 12 Nov 16:22:30.995 * (Non critical) Master does not understand REPLCONF listening-port: -NOAUTH Authentication required`, once I fixed it and replication happened failover worked! – route Nov 12 '19 at 16:33
1

I am not sure why you want to do that in first place. Redis failing over to R2 and using at as master now should perfectly work as normal M1 instance. If that's not the case, you are not actually using Sentinel correctly for high availability.

You can just trigger a manual failover with SENTINEL failover R2. It should switch to either M1 or R3.

kiddorails
  • 12,961
  • 2
  • 32
  • 41
  • as you can see from my diagram, R2 is in Ohio, while M1 is in West Virginia. I want to get it back to reduce the latency @kiddorails – Valentin V Jul 05 '18 at 18:39