11

ConcurrentDictionary.GetOrAdd(TKey, Func<TKey, TValue>) accepts a factory function to allow lazy instantiation of the item to be put into the dictionary.

Is it safe to define a factory function that itself calls GetOrAdd(), i.e. GetOrAdd is being called within the context of a 'parent' GetOrAdd().

The following code demonstrates the pattern; It does appear to work, but is it safe?

class Program
{
    static ConcurrentDictionary<string,object> __dict = new ConcurrentDictionary<string, object>();

    static void Main(string[] args)
    {
        Foo foo = GetOrAddFoo();
        Console.WriteLine(foo._name);
        Console.WriteLine(foo._bar._name);
        Console.ReadKey();
    }

    static Bar GetOrAddBar()
    {
        Console.WriteLine("GetOrAddBar: enter");
        Func<string,Bar> factoryFn = (x) => LoadBar(x);
        Bar bar = __dict.GetOrAdd("bar", factoryFn) as Bar;
        Console.WriteLine("GetOrAddBar: exit");
        return bar;
    }

    static Foo GetOrAddFoo()
    {
        Console.WriteLine("GetOrAddFoo: enter");
        Func<string,Foo> factoryFn = (x) => LoadFoo(x);
        Foo foo = __dict.GetOrAdd("foo", factoryFn) as Foo;
        Console.WriteLine("GetOrAddFoo: exit");
        return foo;
    }

    static Bar LoadBar(string name)
    {
        Bar bar =  new Bar();
        bar._name = name;
        return bar;
    }

    static Foo LoadFoo(string name)
    {
        Foo foo = new Foo();
        foo._name = name;
        foo._bar = GetOrAddBar();
        return foo;
    }

    public class Foo
    {
       public string _name;
       public Bar _bar;
    }

    public class Bar
    {
        public string _name;
    }
}
redcalx
  • 8,177
  • 4
  • 56
  • 105

3 Answers3

3

The answer is yes, it is perfectly safe. The value functions are only called if the given key does not exist yet. Below is a walkthrough of what happens under the hood.

Walkthrough

First we assume the dictionary is completely empty, which I'll only show the keys in an array format for simplicity:

dictionary = []

On an initial first time execution of the GetOrAddFoo method, the "Foo" key does not exist so the dictionary invokes the value function, which in this invocation is the LoadFoo method. The dictionary is still empty here.

dictionary = []

LoadFoo internally calls GetOrAddBar, which checks and finds that the "Bar" key does not exist, so the LoadBar value function is invoked and it returns with the created "Bar" entry. The dictionary at this point looks like this:

dictionary = ["Bar"]

The dictionary at this point contains the item "Bar" in it. We have not yet completed the LoadFoo value function, but it will now.

dictionary = ["Bar"]

The LoadFoo method regains control and returns a Foo object to be stored in the dictionary. After LoadFoo completes, GetOrAddFoo can complete too. We now have the dictionary looking like this:

dictionary = ["Bar", "Foo"]

Future Calls of GetOrAddFoo

On subsequent calls to GetOrAddFoo, the dictionary already has an entry for Foo and thus will not invoke its value function, or even the Bar value function either. It returns immediately.

dictionary = ["Bar", "Foo"]

But what happens if we remove Bar from the dictionary and then call GetOrAddFoo? Say that we do remove it, leaving our dictionary like this:

dictionary = ["Foo"]

And now we call GetOrAddFoo again. Foo still exists, so the dictionary will not call the LoadFoo value function. And thus, Bar is not re-added to the dictionary. Our dictionary remains the same:

dictionary = ["Foo"]

If we call GetOrAddBar directly though, we'll get "Bar" added back in then.

dictionary = ["Foo", "Bar"]

Under the hood of the Concurrent Dictionary's Get-Or-Add method

Under the hood, anytime the GetOrAdd method is called, the existence of the given key is checked first.

If it does NOT exist at that point, the value function is then invoked. AFTER the value function returns, a lock is placed on the dictionary to allow the new entry to be added.

In a multi-threaded world, we could have 2 threads trying to add the same exact key to the dictionary. The dictionary would run the value function, then try to acquire the lock, and after getting a lock, check if the key exists again.

The reason for this is on lock retrieval, another thread could jump in and add the same key. The original thread would need to check for this scenario once it receives the lock, so that it doesn't end up causing a key collision.

ajawad987
  • 4,439
  • 2
  • 28
  • 45
  • This is an interesting answer, but it does not explain why accessing the dictionary from the factory would be safe. When the factory is called, the dictionary is in the middle of updating itself, and may or may not be prepared to handle re-entrancy. – Antoine Aubry Jun 03 '20 at 10:50
  • The dictionary does not lock until the factory actually finishes. If factory A chains to call factory B, the dictionary would lock when B returns so it could add B's value to the dictionary. Then A would be allowed to resume and the dictionary acquires a new lock so A's value could be added. – ajawad987 Jun 03 '20 at 18:46
  • The only caveat is that the value factories have to account for values being added in "reverse" order in this scenario, since the concept it uses is like how a "stack" data type works... in this case its a stack of value factory delegates. – ajawad987 Jun 03 '20 at 18:47
2

When you decompile ConcurrentDictionary.GetOrAdd(TKey, Func) you will see line:

TryAddInternal(key, valueFactory(key), updateIfExists: false, acquireLock: true, out value);

Which means that timeline of your call in one process/thread will be:

enter __dict.GetOrAdd("foo", factoryFn)
enter LoadFoo (from valueFactory(key) in line above)
enter GetOrAddBar
enter __dict.GetOrAdd("bar", factoryFn)
enter LoadBar
leave LoadBar
enter TryAddInternal with key "bar" 
acquire lock ( Monitor.Enter(tables.m_locks[lockNo], ref lockTaken); )
add key "bar" with appropriate value
release lock ( Monitor.Exit(tables.m_locks[lockNo]); )
leave TryAddInternal with key "bar" 
leave __dict.GetOrAdd("bar", factoryFn)
leave GetOrAddBar
leave LoadFoo
enter TryAddInternal with key "foo" 
acquire lock ( Monitor.Enter(tables.m_locks[lockNo], ref lockTaken); )
add key "foo" with appropriate value
release lock ( Monitor.Exit(tables.m_locks[lockNo]); )
leave __dict.GetOrAdd("foo", factoryFn)

You can see that it will lock and release twice and in between another process can jump in and create "foo" when this process have already created "bar". If it's safe depends on you.

And when you'd call recursively with the same key, most in-depth value will "win" because there is updateIfExists: false param in TryAddInternal so any subsequent call will not change it once it's there. There is also out value param so it will return first inserted value and will not fail.

Also interesting: TryAddInternal does not lock whole Dictionary but only a bucket (part of the dicionary) based on the key. It's performance improvement.

ElMent
  • 158
  • 2
  • 11
  • I like the fact that you are using the decompiled code as reference. Although the implementation could change, it is unlikely to happen for such a core class, unless some bug needs to be fixed. It is difficult to follow the timeline as you wrote it. Maybe some indentation would help, and also it is not clear what some method names mean. – Antoine Aubry Jun 04 '20 at 08:25
-1

The valueFactory delegate is invoked and it arrives at the internal lock. (based on MSDN)

So it is quite safe as soon as your delegate is simple and don't access anything not thread safe. I mean that if your method is just create some class, there is no danger. If your method will try read some fields of class or do some locking, you might have to consider how to solve deadlock issues.

I assume this internal lock is lock on __dict[key]

justromagod
  • 933
  • 9
  • 20
  • I don't think that this correctly answers the question, as the documentation states that "... the valueFactory delegate is called outside the locks to avoid the problems that can arise from executing unknown code under a lock.". – Antoine Aubry May 28 '20 at 18:05