ServiceStack.Redis WaitBeforeForcingMasterFailover

Question

Context:

I'm trying to understand the motivation behind existence of WaitBeforeForcingMasterFailover property (and the code associated with it) inside of ServiceStack.Redis.RedisSentinel.

If I interpreted the code right - the meaning behind this property seems to cover cases like:

We have a connection to a healthy sentinel that tells us that a master is at host X
When we try to establish a connection to the master at host X - we fail due to some reason.

So the logic will be - if we continuously fail to create a connection to X for WaitBeforeForcingMasterFailover period - initiate a force failover.

The failover does not need to reach a quorum and can elect a new master just with 1 sentinel available.

SENTINEL FAILOVER Force a failover as if the master was not reachable, and without asking for agreement to other Sentinels (however a new version of the configuration will be published so that the other Sentinels will update their configurations).

Source: https://redis.io/topics/sentinel#sentinel-api

The way it seems to me - this feature can be beneficial in some cases and troublesome in other cases.

For example in case of a network partition if a client is left connected to a minority of sentinels (they can't reach a quorum) and these sentinels point to a master that is no longer reachable - this force failover option will trigger a failover within reachable partition, thus potentially creating a split brain situation.

Coming from Java background I also haven't seen such features available in popular redis clients such as Jedis and Lettuce.

This got me wondering on the following questions:

Are there strong reasons for this feature to be enabled by default? (I understand that you can effectively disable it if you want to by setting a huge value in it). Do they really worth the risk of interfering with natural sentinels workflow and potentially introducing problems like the one I've mentioned before?
Will the library work fine with this option disabled? Are there are cases that I might have missed and turning this feature off will lead to problems even with some happy paths (no network partition, just regular failovers because of a deployment or a sudden node failure)?

score 1 · Accepted Answer · answered Apr 30 '21 at 04:41

It's a fallback that if RedisSentinel is unable to establish a connection to a master client within 60s (default) it will instruct the connected sentinel to force a failover.

You can increase the wait time when configuring RedisSentinel:

new RedisSentinel {
    WaitBeforeForcingMasterFailover = TimeSpan.FromSeconds(...)
}

The alternative to not forcing a failover is that each client trying to use Redis will continue to fail until all sentinels to reach consensus that the master is unresponsive, if the default 60s is too short you should increase it to the maximum amount of time that is acceptable for your App to remain unresponsive.

Will the library work fine with this option disabled?

It's only a fallback that occurs when it's unable to establish a connection with a Redis Client, extending it wont stop RedisSentinel from working but anything trying to use Redis will not work until it's able to establish a valid connection with a Redis Client.

When the fallback does occur your error logs should contain the templated string:

"Valid master was not found at '{0}' within '{1}'. Sending SENTINEL failover..."

If your error logs doesn't contain this, the timeout was never exceeded and a failover was never forced by RedisSentinel.

ServiceStack.Redis WaitBeforeForcingMasterFailover

1 Answers1