23

While reading the ZooKeeper's recipe for lock, I got confused. It seems that this recipe for distributed locks can not guarantee "any snapshot in time no two clients think they hold the same lock". But since ZooKeeper is so widely adopted, if there were such mistakes in the reference documentation, someone should have pointed it out long ago, so what did I misunderstand?

Quoting the recipe for distributed locks:

Locks

Fully distributed locks that are globally synchronous, meaning at any snapshot in time no two clients think they hold the same lock. These can be implemented using ZooKeeeper. As with priority queues, first define a lock node.

  1. Call create( ) with a pathname of "locknode/guid-lock-" and the sequence and ephemeral flags set.
  2. Call getChildren( ) on the lock node without setting the watch flag (this is important to avoid the herd effect).
  3. If the pathname created in step 1 has the lowest sequence number suffix, the client has the lock and the client exits the protocol.
  4. The client calls exists( ) with the watch flag set on the path in the lock directory with the next lowest sequence number.
  5. if exists( ) returns false, go to step 2. Otherwise, wait for a notification for the pathname from the previous step before going to step 2.

Consider the following case:

  • Client1 successfully acquired the lock (in step 3), with ZooKeeper node "locknode/guid-lock-0";
  • Client2 created node "locknode/guid-lock-1", failed to acquire the lock, and is now watching "locknode/guid-lock-0";
  • Later, for some reason (say, network congestion), Client1 fails to send a heartbeat message to the ZooKeeper cluster on time, but Client1 is still working away, mistakenly assuming that it still holds the lock.
  • But, ZooKeeper may think Client1's session is timed out, and then

    1. delete "locknode/guid-lock-0",
    2. send a notification to Client2 (or maybe send the notification first?),
    3. but can not send a "session timeout" notification to Client1 in time (say, due to network congestion).
  • Client2 gets the notification, goes to step 2, gets the only node ""locknode/guid-lock-1", which it created itself; thus, Client2 assumes it hold the lock.
  • But at the same time, Client1 assumes it holds the lock.

Is this a valid scenario?

seh
  • 14,999
  • 2
  • 48
  • 58
hulunbier
  • 233
  • 2
  • 5

3 Answers3

16

The scenario you describe could arise. Client 1 thinks it has the lock, but in fact its session has timed out, and Client 2 acquires the lock.

The ZooKeeper client library will inform Client 1 that its connection has been disconnected (but the client doesn't know the session has expired until the client connects to the server), so the client can write some code and assume that his lock has been lost if he has been disconnected too long. But the thread which uses the lock needs to check periodically that the lock is still valid, which is inherently racy.

seh
  • 14,999
  • 2
  • 48
  • 58
sbridges
  • 24,960
  • 4
  • 64
  • 71
  • thanks; does this means that, the "invariant" -- **at any snapshot in time no two clients think they hold the same lock** does not hold? – hulunbier Jan 11 '13 at 14:45
  • 4
    Another trivial way to violate the "at any snapshot in time" invariant (and thus prove the statement false) is a long GC pause on the client that is holding the lock. E.g.: client C acquires lock, while holding it java GC kicks in and freezes the process for 60 seconds. After 10 of those seconds, C session expires, and another process acquires the lock. Time 11 is "a snapshot in time when 2 clients think they hold the same lock". – Marco Jan 29 '15 at 21:48
  • 3
    Just found this interesting post: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html – zsxwing Jun 03 '16 at 20:48
  • 1
    **at any snapshot in time no two clients think they hold the same lock** is based on the following assumptions: bounded network delay, bounded process pauses and bounded clock error. – zsxwing Jun 03 '16 at 20:49
  • This curator technote applies to the GC pause case: https://cwiki.apache.org/confluence/display/CURATOR/TN10 – Raman Mar 30 '20 at 22:00
0

...But, Zookeeper may think client1's session is timeouted, and then...

From the Zookeeper documentation:

  • The removal of a node will only cause one client to wake up since each node is watched by exactly one client. In this way, you avoid the herd effect.
  • There is no polling or timeouts.

So I don't think the problem you describe arises. It looks to me as thought there could be a risk of hanging locks if something happens to the clients that create them, but the scenario you describe should not arise.

glenatron
  • 11,018
  • 13
  • 64
  • 112
  • thanks; but I do not get your idea well; to me, "avoid herd effect" && "no polling or timeouts" can not ensure that client2 cannot acquire lock while client1 holding the lock. Besides, ephemeral node will be deleted automatically by ZK after session timeouted, so I can not see "the risk of hanging locks" either... – hulunbier Jan 11 '13 at 12:34
  • If Client1's session has timed out then the lock no longer exists in that case, but Client1 will be getting SESSION_EXPIRED or CONNECTION_LOSS back from Zookeeper at that time, so they will know that they have lost connectivity. – glenatron Jan 11 '13 at 13:55
  • But what if the SESSION_EXPIRED message was not sent to Client1 **in time**? Due to temporary high packet loss rate, for example(TCP connection remains ESTABLISHED) – hulunbier Jan 11 '13 at 14:32
  • you get a connection_loss, not a session_expired. You have to be connected to zookeeper to get a session expired. – sbridges Jan 12 '13 at 03:23
0

from packt book - Zookeeper Essentials

If there was a partial failure in the creation of znode due to connection loss, it's possible that the client won't be able to correctly determine whether it successfully created the child znode. To resolve such a situation, the client can store its session ID in the znode data field or even as a part of the znode name itself. As a client retains the same session ID after a reconnect, it can easily determine whether the child znode was created by it by looking at the session ID.

Steven Wong
  • 131
  • 1
  • 3