26

Let's take Wes Dyer's approach to function memoization as the starting point:

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new Dictionary<A, R>();
  return a =>
    {
      R value;
      if (map.TryGetValue(a, out value))
        return value;
      value = f(a);
      map.Add(a, value);
      return value;
    };
}

The problem is, when using it from multiple threads, we can get into trouble:

Func<int, int> f = ...
var f1 = f.Memoize();
...
in thread 1:
var y1 = f1(1);
in thread 2:
var y2 = f1(1);
// We may be recalculating f(1) here!

Let's try to avoid this. Locking on map:

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new Dictionary<A, R>();
  return a =>
    {
      R value;
      lock(map) 
      {
        if (map.TryGetValue(a, out value))
          return value;
        value = f(a);
        map.Add(a, value);
      }
        return value;
    };
}

is clearly a horrible idea, because it prevents us from calculating f1 on many different arguments at once. Locking on a won't work if a has a value type (and at any rate is a bad idea, since we don't control a and outside code may lock on it, too).

Here are two options I can think of:

Assuming a Lazy<T> class for lazy evaluation (see here):

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new Dictionary<A, Lazy<R>>();
  return a =>
    {
      Lazy<R> result;
      lock(map) 
      {
        if (!map.TryGetValue(a, out result))
        {  
          result = () => f(a);
          map.Add(a, result);
        }
      }
      return result.Value;
    };
}

Or keeping an additional dictionary of objects for synchronization:

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new Dictionary<A, R>();
  var mapSync = new Dictionary<A, object>();
  return a =>
    {
      R value;
      object sync;
      lock(mapSync)
      { 
        if (!mapSync.TryGetValue(a, out sync))
        { 
          sync = new object();
          mapSync[a] = sync;
        }
      }
      lock(map)
      {
        if (map.TryGetValue(a, out value))
          return value;
      }
      lock(sync)
      {
        value = f(a);
        lock(map)
        {
          map[a] = value;
        }
        return value;
      }
    };
}

Any better options?

Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487

7 Answers7

44

Use .net 4.0's ConcurrentDictionary<A, R> without the unnecessary Lazy<R>.
The key is GetOrAdd(A, Func<A, R>) which renders into a beautifully trivial lambda.

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
    var cache = new ConcurrentDictionary<A, R>();
    return a => cache.GetOrAdd(a, f);
};

Update The above solution does allow multiple simultaneous readers & writers with the minimum of overhead. But, it doesn't prevent f(a) from being executed more than once for the same value (during the period while it is being calculated).

If that is vital to you, you could wrap the value in Lazy<R> but you incur a cost for every read.

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
    var cache = new ConcurrentDictionary<A, Lazy<R>>();
    return a => cache.GetOrAdd(a, new Lazy<R>(() => f(a))).Value;
}

Update Timing tests for a million reads of a pre-populated 1000-item cache show 19ms for ConcurrentDictionary -- same as regular Dictionary -- but 720ms for the Lazy version.

If that sounds too steep, you can get the best of both worlds with a more complex solution.

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
    var cache = new ConcurrentDictionary<A, R>();
    var syncMap = new ConcurrentDictionary<A, object>();
    return a =>
    {
        R r;
        if (!cache.TryGetValue(a, out r))
        {
            var sync = syncMap.GetOrAdd(a, new object());
            lock (sync)
            {
                r = cache.GetOrAdd(a, f);
            }
            syncMap.TryRemove(a, out sync);
        }
        return r;
    };
}
Nigel Touch
  • 1,946
  • 2
  • 19
  • 12
10

If you already have that Lazy<T> type, I assume you're using .net 4.0, so you could also use the ConcurrentDictionary<A,R>:

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
  var map = new ConcurrentDictionary<A, Lazy<R>>();
  return a =>
    {
      Lazy<R> lazy = new Lazy<R>(() => f(a), LazyExecutionMode.EnsureSingleThreadSafeExecution);
      if(!map.TryAdd(a, lazy))
      {
        return map[a].Value;
      }
      return lazy.Value;
    };
}
Ronald Wildenberg
  • 31,634
  • 14
  • 90
  • 133
Thomas Danecker
  • 4,635
  • 4
  • 32
  • 31
3

Expanding on Nigel Touch's excellent answer, I wanted to offer a reusable component extracted from his solution limiting the invocation count for f(a).

I called it SynchronizedConcurrentDictionary, and it looks like this:

public class SynchronizedConcurrentDictionary<TKey, TValue> : ConcurrentDictionary<TKey, TValue>
{
    private readonly ReaderWriterLockSlim _cacheLock = new ReaderWriterLockSlim();

    public new TValue GetOrAdd(TKey key, Func<TKey, TValue> valueFactory)
    {
        TValue result;

        _cacheLock.EnterWriteLock();
        try
        {
            result = base.GetOrAdd(key, valueFactory);
        }
        finally
        {
            _cacheLock.ExitWriteLock();
        }

        return result;
    }
}

Then the Memoize function becomes a two-liner:

public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
    var cache = new SynchronizedConcurrentDictionary<A, R>();

    return key => cache.GetOrAdd(key, f);
}

Cheers!

bopapa_1979
  • 8,949
  • 10
  • 51
  • 76
  • 2
    Why the downvote with no comment? I was just trying to provide something I derived and found useful to the community. What's the problem? – bopapa_1979 Jun 11 '15 at 20:51
  • NOTE: The name "SynchronizedConcurrentDictionary" is probably a bad one! ConcurrentDictionary implements ICollection, which has a property "IsSynchronized" which gets a value indicating whether access to the ICollection is synchronized (thread safe). ConcurrentDictionary returns false from this property, and the SyncRoot property throws an exception if you try to read it. The name "SynchronizedConcurrentDictionary" could be interpreted to imply that the collection is synchronized via SyncRoot, which is false. – bopapa_1979 Jun 18 '15 at 16:00
2

Thomas's answer does not seem to compile under .NET 4.0 due to the enum parameter to the Lazy constructor. I revised it below. I also added an optional parameter for supplying one's own equality comparer. This is useful if TInput does not implement its own Equals or if TInput is a string and you want to make it case insensitive, for example.

    public static Func<TInput, TResult> Memoize<TInput, TResult>(
        this Func<TInput, TResult> func, IEqualityComparer<TInput> comparer = null)
    {
        var map = comparer == null
                      ? new ConcurrentDictionary<TInput, Lazy<TResult>>()
                      : new ConcurrentDictionary<TInput, Lazy<TResult>>(comparer);

        return input =>
               {
                   var lazy = new Lazy<TResult>(() => func(input), LazyThreadSafetyMode.ExecutionAndPublication);

                   return map.TryAdd(input, lazy)
                              ? lazy.Value
                              : map[input].Value;
               };
    }

I did some basic testing of this method using this as my test:

    public void TestMemoize()
    {
        Func<int, string> mainFunc = i =>
                                     {
                                         Console.WriteLine("Evaluating " + i);
                                         Thread.Sleep(1000);
                                         return i.ToString();
                                     };

        var memoized = mainFunc.Memoize();

        Parallel.ForEach(
            Enumerable.Range(0, 10),
            i => Parallel.ForEach(Enumerable.Range(0, 10), j => Console.WriteLine(memoized(i))));
    }

It seems to be working correctly.

Jim Bolla
  • 8,265
  • 36
  • 54
1

No, they are not better options.

The version with the lazy evaluation is pointless as you evaluate it immediately anyway. The version with the synchronisation dictionary doesn't work properly as you are not protecting the map dictionary inside a lock before using it.

The version that you called horrible is actually the best option. You have to protect the map dictionary inside a lock so that only one thread at a time can access it. The dictionary is not thread safe, so if you let one thread read from it while another thread is changing it, you will have problems.

Remember that using lock on the map object doesn't protect the map object in itself, it's only using the map reference as an identifier to keep more than one thread at a time to run the code inside the lock. You have to put all code that accesses the object inside the lock, not just the code that is changing the object.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • I've fixed the lazy evaluation version. – Alexey Romanov Aug 10 '09 at 14:39
  • And the synchronisation dictionary version. – Alexey Romanov Aug 10 '09 at 14:47
  • The lazy evaluation version still is pointess as the value is always evaluated immediately. The synchronisation dictionary version still is not safe, as different threads can create objects for the same key, and one will overwrite the other. – Guffa Aug 10 '09 at 23:51
  • Lazy evaluation: the threads get the same `Lazy` reference, and thus only one will actually evaluate it. Synchronisation dictionary: how can different threads create objects for the same key under `lock(mapSync)`? – Alexey Romanov Aug 11 '09 at 18:10
  • No, more than one thread may actually evaluate the lazy object as it's not synchronised, but that is not the point. Making the object lazy is pointless as it's always evaluated in the next line. In the version with the synchronisation dictionary there is a gap between the lock where you have determined that the object is not in the map and the lock where you create the object. In that gap another thread can determine that it's not in the map and add it to the map. It's a very small gap, but it's there. – Guffa Aug 11 '09 at 18:46
1

You don't want to calculate the same value twice and you want many threads to be able to calculate values and or retrieve values concurrently. To do this you will need to use some sort of condition variable and fine grained locking system.

Heres the idea. when no value is present you put a value into the sync map and then any thread who needs that value will wait for it otherwise you will just grab the current value. this way locking of the map is minimized to querying for values and returning values.

    public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
    {
        var map = new Dictionary<A, R>();
        var mapSync = new Dictionary<A, object>();
        return a =>
        {
            R value;
            object sync = null;
            bool calc = false;
            bool wait = false;
            lock (map)
            {
                if (!map.TryGetValue(a, out value))
                {
                    //its not in the map
                    if (!mapSync.TryGetValue(a, out sync))
                    {
                        //not currently being created
                        sync = new object();
                        mapSync[a] = sync;
                        calc = true;

                    }
                    else
                    {
                        calc = false;
                        wait = true;
                    }
                }
            }
            if(calc)
            {
                lock (sync)
                {
                    value = f(a);
                    lock (map)
                    {
                        map.Add(a, value);
                        mapSync.Remove(a);
                    }
                    Monitor.PulseAll(sync);
                    return value;
                }
            }
            else if (wait)
            {
                lock (sync)
                {
                    while (!map.TryGetValue(a, out value))
                    {
                        Monitor.Wait(sync);
                    }
                    return value;
                }
            }

            lock (map)
            {
                return map[a];
            }

        };
    }

This is just a quick first try but i think it demonstrates the technique. Here you are trading additional memory for speed.

luke
  • 14,518
  • 4
  • 46
  • 57
0

Did you read the comment from Dyer related to thread-safe in the article?

Probably the easiest way to make Memoize thread-safe is to put a lock on map.

This will ensure that the function that is being memoized will only be run once for each set of distinct arguments.

In my example of the RoboRally game, I actually used function memoization to act as "surrogate singleton". It isn't really a singleton since there can be one instance per factory instance (unless the factory is static). But that is exactly what I wanted.

Community
  • 1
  • 1
Armbrat
  • 2,295
  • 1
  • 23
  • 32
  • Yes, that's the _easiest_ way. I specifically said what's bad about it: it stops us from evaluating the function on different arguments concurrently as well. – Alexey Romanov Aug 10 '09 at 14:33