Better solution to multithreading riddle?

Question

Here's the task: I need to lock based on a filename. There can be up to a million different filenames. (This is used for large-scale disk-based caching). I want low memory usage and low lookup times, which means I need a GC'd lock dictionary. (Only in-use locks can be present in the dict).

The callback action can take minutes to complete, so a global lock is unacceptable. High throughput is critical.

I've posted my current solution below, but I'm unhappy with the complexity.

EDIT: Please do not post solutions that are not 100% correct. For example, a solution which permits a lock to be removed from the dictionary between the 'get lock object' phase and the 'lock' phase is NOT correct, whether or not it is an 'accepted' design pattern or not.

Is there a more elegant solution than this?

Thanks!

[EDIT: I updated my code to use looping vs. recursion based on RobV's suggestion]

[EDIT: Updated the code again to allow 'timeouts' and a simpler calling pattern. This will probably be the final code I use. Still the same basic algorithm as in the original post.]

[EDIT: Updated code again to deal with exceptions inside callback without orphaning lock objects]

public delegate void LockCallback();
/// <summary>
/// Provides locking based on a string key. 
/// Locks are local to the LockProvider instance.
/// The class handles disposing of unused locks. Generally used for 
/// coordinating writes to files (of which there can be millions). 
/// Only keeps key/lock pairs in memory which are in use.
/// Thread-safe.
/// </summary>
public class LockProvider {

    /// <summary>
    /// The only objects in this collection should be for open files. 
    /// </summary>
    protected Dictionary<String, Object> locks = 
                    new Dictionary<string, object>(StringComparer.Ordinal);
    /// <summary>
    /// Synchronization object for modifications to the 'locks' dictionary
    /// </summary>
    protected object createLock = new object();
    /// <summary>
    /// Attempts to execute the 'success' callback inside a lock based on 'key'.  If successful, returns true.
    /// If the lock cannot be acquired within 'timoutMs', returns false
    /// In a worst-case scenario, it could take up to twice as long as 'timeoutMs' to return false.
    /// </summary>
    /// <param name="key"></param>
    /// <param name="success"></param>
    /// <param name="failure"></param>
    /// <param name="timeoutMs"></param>
    public bool TryExecute(string key, int timeoutMs, LockCallback success){
        //Record when we started. We don't want an infinite loop.
        DateTime startedAt = DateTime.UtcNow;

        // Tracks whether the lock acquired is still correct
        bool validLock = true; 
        // The lock corresponding to 'key'
        object itemLock = null;

        try {
            //We have to loop until we get a valid lock and it stays valid until we lock it.
            do {
                // 1) Creation/aquire phase
                lock (createLock) {
                    // We have to lock on dictionary writes, since otherwise 
                    // two locks for the same file could be created and assigned
                    // at the same time. (i.e, between TryGetValue and the assignment)
                    if (!locks.TryGetValue(key, out itemLock))
                        locks[key] = itemLock = new Object(); //make a new lock!

                }
                // Loophole (part 1):
                // Right here - this is where another thread (executing part 2) could remove 'itemLock'
                // from the dictionary, and potentially, yet another thread could 
                // insert a new value for 'itemLock' into the dictionary... etc, etc..

                // 2) Execute phase
                if (System.Threading.Monitor.TryEnter(itemLock, timeoutMs)) {
                    try {
                        // May take minutes to acquire this lock. 

                        // Trying to detect an occurence of loophole above
                        // Check that itemLock still exists and matches the dictionary
                        lock (createLock) {
                            object newLock = null;
                            validLock = locks.TryGetValue(key, out newLock);
                            validLock = validLock && newLock == itemLock;
                        }
                        // Only run the callback if the lock is valid
                        if (validLock) {
                            success(); // Extremely long-running callback, perhaps throwing exceptions
                            return true;
                        }

                    } finally {
                        System.Threading.Monitor.Exit(itemLock);//release lock
                    }
                } else {
                    validLock = false; //So the finally clause doesn't try to clean up the lock, someone else will do that.
                    return false; //Someone else had the lock, they can clean it up.
                }

                //Are we out of time, still having an invalid lock?
                if (!validLock && Math.Abs(DateTime.UtcNow.Subtract(startedAt).TotalMilliseconds) > timeoutMs) {
                    //We failed to get a valid lock in time. 
                    return false;
                }


                // If we had an invalid lock, we have to try everything over again.
            } while (!validLock);
        } finally {
            if (validLock) {
                // Loophole (part 2). When loophole part 1 and 2 cross paths,
                // An lock object may be removed before being used, and be orphaned

                // 3) Cleanup phase - Attempt cleanup of lock objects so we don't 
                //   have a *very* large and slow dictionary.
                lock (createLock) {
                    //  TryEnter() fails instead of waiting. 
                    //  A normal lock would cause a deadlock with phase 2. 
                    //  Specifying a timeout would add great and pointless overhead.
                    //  Whoever has the lock will clean it up also.
                    if (System.Threading.Monitor.TryEnter(itemLock)) {
                        try {
                            // It succeeds, so no-one else is working on it 
                            // (but may be preparing to, see loophole)
                            // Only remove the lock object if it 
                            // still exists in the dictionary as-is
                            object existingLock = null;
                            if (locks.TryGetValue(key, out existingLock)
                                && existingLock == itemLock)
                                locks.Remove(key);
                        } finally {
                            // Remove the lock
                            System.Threading.Monitor.Exit(itemLock);
                        }
                    }
                }
            }
        }
        // Ideally the only objects in 'locks' will be open operations now.
        return true;
    }
}

Usage example

LockProvider p = new LockProvider();
bool success = p.TryExecute("filename",1000,delegate(){
  //This code executes within the lock
});

Can you use .NET4? If so, you might be interested in the [System.Collections.Concurrent namespace](http://msdn.microsoft.com/en-us/library/system.collections.concurrent.aspx). — R. Martinho Fernandes, Apr 06 '11 at 11:15
Stepping back a lot, if you are coordinating writes to a file system, can't you open files exclusively to achieve locking without using this kind of approach? — Jeff Foster, Apr 06 '11 at 11:16
@Martinho I'm afraid not, I'm stuck supporting 2.0 (this is a library). — Lilith River, Apr 06 '11 at 12:56
@Jeff I originally used that approach, but I ran into frequent locking issues which I never figured out. I'm not sure how windows handles file locks, but I have yet to see it act correctly. I also need atomicity outside of the file lock, for example, when I delete a file. — Lilith River, Apr 06 '11 at 12:58
I don't know what sort of problems you were having, but the file system is designed to handle exactly this sort of thing. There is even an atomic delete-on-close feature. You may have to use P/Invoke to access all of these features, but I think it would be way better than writing your own, both in terms of correctness and performance. — Jeffrey L Whitledge, Apr 07 '11 at 20:17
have you considered the fact that a file can have more than one name? — David Heffernan, Apr 07 '11 at 20:24
@David: Yes, all filenames are generated and normalized prior to usage with LockProvider. @Jeffery: This library has to be capable of running under low trust, and while many hosts permit file writes, they do not allow p/invoke. — Lilith River, Apr 08 '11 at 10:29
@Computer Linguist: Had any luck with your riddle? Did my answer help? — Steven, Apr 21 '11 at 13:32
Thanks for you help, but your solution doesn't solve the original loophole: the removal of the lock object from the dictionary while a reference has already been copied by another caller. See my comment on your answer. — Lilith River, Apr 21 '11 at 14:34
@Computer Linguist: Hmmm... I'll try to give a look tonight. I found multithreading problems, uhh riddles rather interesting :-) — Steven, Apr 21 '11 at 14:50

score 1 · Answer 1 · answered Apr 06 '11 at 12:11

Depending on what you are doing with the files (you say disk based caching so I assume reads as well as writes) then I would suggest trying something based upon ReaderWriterLock, if you can upgrade to .Net 3.5 then try ReaderWriterLockSlim instead as it performs much better.

As a general step to reducing the potential endless recursion case in your example change the first bit of the code to the following:

do 
{
    // 1) Creation/aquire phase
    lock (createLock){
        // We have to lock on dictionary writes, since otherwise 
        // two locks for the same file could be created and assigned
        // at the same time. (i.e, between TryGetValue and the assignment)
        if (!locks.TryGetValue(key, out itemLock)) 
            locks[key] = itemLock = new Object(); //make a new lock!

    }
    // Loophole (part 1):
    // Right here - this is where another thread could remove 'itemLock'
    // from the dictionary, and potentially, yet another thread could 
    // insert a new value for 'itemLock' into the dictionary... etc, etc..

    // 2) Execute phase
    lock(itemLock){ 
        // May take minutes to acquire this lock. 
        // Real version would specify a timeout and a failure callback.

        // Trying to detect an occurence of loophole above
        // Check that itemLock still exists and matches the dictionary
        lock(createLock){
            object newLock = null;
            validLock = locks.TryGetValue(key, out newLock);
            validLock = validLock && newLock == itemLock;
        }
        // Only run the callback if the lock is valid
        if (validLock) callback(); // Extremely long-running callback. 
    }
    // If we had an invalid lock, we have to try everything over again.
} while (!validLock);

This replaces your recursion with a loop which avoids any chance of a StackOverflow by endless recursion.

Do you notice any other potential loopholes with the implementation? — Lilith River, Apr 06 '11 at 12:55
All callbacks are writes. Reads are uncontrolled, I'm letting the file system handle that. IIS is doing all the reads anyway, my hands are tied. — Lilith River, Apr 06 '11 at 13:55

Steven · Answer 2 · 2011-04-23T07:54:59.980

1

That solution sure looks brittle and complex. Having public callbacks inside locks is bad practice. Why won't you let LockProvider return some sort of 'lock' objects, so that the consumers do the lock themselves. This separates the locking of the locks dictionary from the execution. It might look like this:

public class LockProvider
{
    private readonly object globalLock = new object();

    private readonly Dictionary<String, Locker> locks =
        new Dictionary<string, Locker>(StringComparer.Ordinal);

    public IDisposable Enter(string key)
    {
        Locker locker;

        lock (this.globalLock)
        {
            if (!this.locks.TryGetValue(key, out locker))
            {
                this.locks[key] = locker = new Locker(this, key);
            }

            // Increase wait count ínside the global lock
            locker.WaitCount++;
        }

        // Call Enter and decrease wait count óutside the 
        // global lock (to prevent deadlocks).
        locker.Enter();

        // Only one thread will be here at a time for a given locker.
        locker.WaitCount--;

        return locker;
    }

    private sealed class Locker : IDisposable
    {
        private readonly LockProvider provider;
        private readonly string key;
        private object keyLock = new object();

        public int WaitCount;

        public Locker(LockProvider provider, string key)
        {
            this.provider = provider;
            this.key = key;
        }

        public void Enter()
        {
            Monitor.Enter(this.keyLock);
        }

        public void Dispose()
        {
            if (this.keyLock != null)
            {
                this.Exit();
                this.keyLock = null;
            }
        }

        private void Exit()
        {
            lock (this.provider.globalLock)
            {
                try
                {
                    // Remove the key before releasing the lock, but
                    // only when no threads are waiting (because they
                    // will have a reference to this locker).
                    if (this.WaitCount == 0)
                    {
                        this.provider.locks.Remove(this.key);
                    }
                }
                finally
                {
                    // Release the keyLock inside the globalLock.
                    Monitor.Exit(this.keyLock);
                }
            }
        }
    }
}

And the LockProvider can be used as follows:

public class Consumer
{
    private LockProvider provider;

    public void DoStufOnFile(string fileName)
    {
        using (this.provider.Enter(fileName))
        {
            // Long running operation on file here.
        }
    }
}

Note that Monitor.Enter is called before we enter the try statement (using), which means in certain host environments (such as ASP.NET and SQL Server) we have the possibility of locks never being released when an asynchronous exception happens. Hosts like ASP.NET and SQL Server aggressively kill threads when timeouts occur. Rewriting this with the Enter outside the Monitor.Enter inside the try is a bit tricky though.

I hope this helps.

edited Apr 23 '11 at 07:54

answered Apr 06 '11 at 12:11

Steven

166,672
24
332
435

I originally used that approach, but it is subject to the same loophole if you add a DisposeLockObject() method per my requirements. – Lilith River Apr 06 '11 at 12:50
I'm not sure if I agree that having callbacks inside locks is a bad practice. An API which exposes callbacks is always simpler and less prone to usage errors than one which directly exposes locks. The only error a caller can make in a callback is a recursive call, something which is both obvious and simple, and easily documented. – Lilith River Apr 06 '11 at 12:52
@Computer: The advice not to do callbacks inside a lock is something that is explicitly stated in the Framework Design Guidelines. – Steven Apr 06 '11 at 12:57
@Computer: What exactly does this "DisposeLockObject" object do and what are the requirements? I'm not sure I follow you. – Steven Apr 06 '11 at 12:59
A critical requirement in my situation is that the only locks present in the dictionary must be open locks. Locks must be removed (Garbage collected, if you will) from the dictionary after they are no longer in use. To quote: "(Only in-use locks can be present in the dict)." – Lilith River Apr 06 '11 at 13:37
@Steven. Is there any reasoning behind said advice? I have found that most microsoft guidelines are poorly thought out, but I am willing to learn if given logical basis. – Lilith River Apr 06 '11 at 13:41
I just realized that even a recursive call from within a callback will not deadlock, since Monitor allows re-entrant behavior. – Lilith River Apr 06 '11 at 13:43
@Computer: Oh yes, how code I have missed that :-S. See my new proposal. I think this code is consistent and solve your problem. – Steven Apr 06 '11 at 15:29
@Steven: I like the approach. Some rambling: Why did you get rid of IDisposable? `GetLockObjectForKey` isn't creating a `Locker` or returning a value. Any reason for locking on `dictionaryLock` object instead of `IDictionary.SyncRoot`? Or just preference? Any reason for `GetLockObjectForKey` not to call `Enter` before returning the Locker? Or maybe another method called `GetLock(string)` that gets a locker and calls Enter before returning? It would remove the requirement that the consumer call Enter every time. I guess if the consumer calls `Enter` the lock will be slightly shorter. – Randy Levy Apr 06 '11 at 19:25
@Tuzo: I forgot to return a new `Locker` object; fixed that. I could have done a `lock(this.locks)` as well, just a habbit. As long as you lock on something private. For instance, don't do `lock(this)`. – Steven Apr 07 '11 at 05:42
There certainly is a good reason not to call `Enter` inside the `GetLockObjectForKey` method. In that case `Enter` would be called before entering the `try` statement and in the case of an asynchronous exception in that time frame, the lock would stay open for ever. You see this pattern at more places in the .NET framework. `SqlConnection` for instance, has an `Open` method. – Steven Apr 07 '11 at 05:46
@Steven: This still doesn't solve the fundamental problem mentioned in the original question. If the lock is removed from the dictionary between GetLockObjectForKey() and locker.Enter(), two separate lock objects will exist simultaneously, allowing parallel execution of two threads working simultaneously on the same file. – Lilith River Apr 21 '11 at 14:32
@Computer Linguist: You are absolutely right. How could I have missed this. That's why we need pair programming ;-) We need some sort of wait counter to make sure we are allowed to remove the item. I will update my answer this weekend. Cheers – Steven Apr 22 '11 at 22:36
@Computer Linguist: I've updated my answer. It now uses a field that counts the threads that are waiting to enter the lock. I think this effectively solves the fundamental problem. – Steven Apr 23 '11 at 07:56
Yes, I think it does. When you add timeouts to the locking, though, the pattern breaks down a bit, though - the assumption the cleanup routine makes is that all locks are successful. – Lilith River Apr 23 '11 at 11:09

Joe · Answer 3 · 2011-04-07T06:44:22.317

Could you not simply used a named Mutex, with the name derived from your filename?

Although not a lightweight synchronization primitive, it's simpler than managing your own synchronized dictionary.

However if you really do want to do it this way, I'd have thought the following implementation looks simpler. You need a synchonized dictionary - either the .NET 4 ConcurrentDictionary or your own implementation if you're on .NET 3.5 or lower.

try
{
    object myLock = new object();
    lock(myLock)
    {
        object otherLock = null;
        while(otherLock != myLock)
        {
           otherLock = lockDictionary.GetOrAdd(key, myLock);
           if (otherLock != myLock)
           {
               // Another thread has a lock in the dictionary
               if (Monitor.TryEnter(otherLock, timeoutMs))
               {
                   // Another thread still has a lock after a timeout
                   failure();
                   return;
               }
               else
               {
                   Monitor.Exit(otherLock);
               }
           }
        }
        // We've successfully added myLock to the dictionary
        try
        {
           // Do our stuff
           success();
        }
        finally
        {
            lockDictionary.Remove(key);
        }
    }
}

I think you mean !Monitor.TryEnter, but I like your approach. The only possible downside I see is that a lot of reordering would happen in a contention situation, vs. the original approach only reordering on a worse-case coincidence... — Lilith River, Apr 08 '11 at 10:40

Lilith River · Accepted Answer · 2011-12-09T07:08:49.500

There doesn't seem to be an elegant way to do this in .NET, although I have improved the algorithm thanks to @RobV's suggestion of a loop. Here is the final solution I settled on.

It is immune to the 'orphaned reference' bug that seems to be typical of the standard pattern followed by @Steven's answer.

using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;

namespace ImageResizer.Plugins.DiskCache {

public delegate void LockCallback();
/// <summary>
/// Provides locking based on a string key. 
/// Locks are local to the LockProvider instance.
/// The class handles disposing of unused locks. Generally used for 
/// coordinating writes to files (of which there can be millions). 
/// Only keeps key/lock pairs in memory which are in use.
/// Thread-safe.
/// </summary>
public class LockProvider {

    /// <summary>
    /// The only objects in this collection should be for open files. 
    /// </summary>
    protected Dictionary<String, Object> locks = 
                    new Dictionary<string, object>(StringComparer.Ordinal);
    /// <summary>
    /// Synchronization object for modifications to the 'locks' dictionary
    /// </summary>
    protected object createLock = new object();
    /// <summary>
    /// Attempts to execute the 'success' callback inside a lock based on 'key'.  If successful, returns true.
    /// If the lock cannot be acquired within 'timoutMs', returns false
    /// In a worst-case scenario, it could take up to twice as long as 'timeoutMs' to return false.
    /// </summary>
    /// <param name="key"></param>
    /// <param name="success"></param>
    /// <param name="failure"></param>
    /// <param name="timeoutMs"></param>
    public bool TryExecute(string key, int timeoutMs, LockCallback success){
        //Record when we started. We don't want an infinite loop.
        DateTime startedAt = DateTime.UtcNow;

        // Tracks whether the lock acquired is still correct
        bool validLock = true; 
        // The lock corresponding to 'key'
        object itemLock = null;

        try {
            //We have to loop until we get a valid lock and it stays valid until we lock it.
            do {
                // 1) Creation/aquire phase
                lock (createLock) {
                    // We have to lock on dictionary writes, since otherwise 
                    // two locks for the same file could be created and assigned
                    // at the same time. (i.e, between TryGetValue and the assignment)
                    if (!locks.TryGetValue(key, out itemLock))
                        locks[key] = itemLock = new Object(); //make a new lock!

                }
                // Loophole (part 1):
                // Right here - this is where another thread (executing part 2) could remove 'itemLock'
                // from the dictionary, and potentially, yet another thread could 
                // insert a new value for 'itemLock' into the dictionary... etc, etc..

                // 2) Execute phase
                if (System.Threading.Monitor.TryEnter(itemLock, timeoutMs)) {
                    try {
                        // May take minutes to acquire this lock. 

                        // Trying to detect an occurence of loophole above
                        // Check that itemLock still exists and matches the dictionary
                        lock (createLock) {
                            object newLock = null;
                            validLock = locks.TryGetValue(key, out newLock);
                            validLock = validLock && newLock == itemLock;
                        }
                        // Only run the callback if the lock is valid
                        if (validLock) {
                            success(); // Extremely long-running callback, perhaps throwing exceptions
                            return true;
                        }

                    } finally {
                        System.Threading.Monitor.Exit(itemLock);//release lock
                    }
                } else {
                    validLock = false; //So the finally clause doesn't try to clean up the lock, someone else will do that.
                    return false; //Someone else had the lock, they can clean it up.
                }

                //Are we out of time, still having an invalid lock?
                if (!validLock && Math.Abs(DateTime.UtcNow.Subtract(startedAt).TotalMilliseconds) > timeoutMs) {
                    //We failed to get a valid lock in time. 
                    return false;
                }


                // If we had an invalid lock, we have to try everything over again.
            } while (!validLock);
        } finally {
            if (validLock) {
                // Loophole (part 2). When loophole part 1 and 2 cross paths,
                // An lock object may be removed before being used, and be orphaned

                // 3) Cleanup phase - Attempt cleanup of lock objects so we don't 
                //   have a *very* large and slow dictionary.
                lock (createLock) {
                    //  TryEnter() fails instead of waiting. 
                    //  A normal lock would cause a deadlock with phase 2. 
                    //  Specifying a timeout would add great and pointless overhead.
                    //  Whoever has the lock will clean it up also.
                    if (System.Threading.Monitor.TryEnter(itemLock)) {
                        try {
                            // It succeeds, so no-one else is working on it 
                            // (but may be preparing to, see loophole)
                            // Only remove the lock object if it 
                            // still exists in the dictionary as-is
                            object existingLock = null;
                            if (locks.TryGetValue(key, out existingLock)
                                && existingLock == itemLock)
                                locks.Remove(key);
                        } finally {
                            // Remove the lock
                            System.Threading.Monitor.Exit(itemLock);
                        }
                    }
                }
            }
        }
        // Ideally the only objects in 'locks' will be open operations now.
        return true;
    }
}
}

Consuming this code is very simple:

LockProvider p = new LockProvider();
bool success = p.TryExecute("filename",1000,delegate(){
  //This code executes within the lock
});

Better solution to multithreading riddle?

4 Answers4

Linked