My question is related to Leader Latch recipe.
I want to use Leader latch to implement a mutex for a scheduled job. There's another requirement: if the scheduled job starts at 1:00:00.005 PM and ends at 1:00:00.015 PM then no other job/instance should start the same task until 1:00:30.000 PM (for this I was thinking about implementing an asynchronous release in the job).
From the docs: https://curator.apache.org/curator-recipes/leader-latch.html
Error Handling
LeaderLatch instances add a ConnectionStateListener to watch for connection problems. If SUSPENDED or LOST is reported, the LeaderLatch that is the leader will report that it is no longer the leader (i.e. there will not be a leader until the connection is re-established). If a LOST connection is RECONNECTED, the LeaderLatch will delete its previous ZNode and create a new one.
Users of LeaderLatch must take account that connection issues can cause leadership to be lost. i.e. hasLeadership() returns true but some time later the connection is SUSPENDED or LOST. At that point hasLeadership() will return false. It is highly recommended that LeaderLatch users register a ConnectionStateListener.
If I understand correctly, in case the leader I1 (instance 1) goes down then the other instances will wait until I1 gets back online and reestablishes the connection. But what happens if I1 never gets up again? Will the other instances be able to become leaders? How and when? Or will the other instances be locked forever? How can they be unlocked?
My expectation is that, somehow, behind the scene, there should be a timeout for the leader connection. Maybe it might be related to how the Curator client is configured. Maybe when the connection is lost some reelection will happen. But none of this is described in the error handling section mentioned above nor in https://curator.apache.org/errors.html