3

I know this topic (memoization) has been discussed quite a bit (like here), but no answer I could find satisfies the DRY principle as much as I would like, so please, read this whole question and the three points I want to address.

I have a simple support class like this:

public class Memoized<T1, TResult>
{
    private readonly Func<T1, TResult> _f;

    private readonly Dictionary<T1, TResult> _cache = new Dictionary<T1, TResult>();

    public Memoized(Func<T1, TResult> f)
    {
        _f = f;
    }

    public TResult Invoke(T1 p1)
    {
        if (p1 == null) throw new ArgumentNullException(nameof(p1));

        if (_cache.TryGetValue(p1, out var res)) return res;

        return _cache[p1] = _f(p1);
    }
    public static Func<T1, TResult> Of(Func<T1, TResult> f)
    {
        var memo = new Memoized<T1, TResult>(f);

        return x => memo.Invoke(x);
    }
}

Nothing especially fancy, but it allows me to do this:

public class MyClass
{
    public Func<int, bool> MemoizedMethod { get; }

    private bool UncachedMethod(int v)
    {
        return v > 0;
    }

    public MyClass()
    {
        MemoizedMethod = Memoized<int, bool>.Of(UncachedMethod);
    }
}

Now, even if the resulting code is not terribly noisy, I'm trying to figure out if the implementation could be more elegant, because currently I need:

  1. an invokable property that acts as a Method.
  2. a true method that should not be called directly.
  3. a line in the constructor (cannot be an inline initialization) that links the two (with a third repetition of the function signature!).

Any suggestion for a strategy that allows to remove one (or two!) of the above points would be great.

Alberto Chiesa
  • 7,022
  • 2
  • 26
  • 53
  • 2
    It doesn't necessarily make sense to implement memoization in a generic way, rather this is something you should do on a case-by-case basis where you expect to see a performance improvement. I.e. when the cost of maintaining a cache of memoized results is less than the cost of recalculating them. – Ben Nov 13 '18 at 16:27
  • Uh, no. It needs to be generic: because I will need it many times. Not everywhere, obviously, but a lot of times, yes (I'm working on a 40k LOC project). So, I need a syntax that is the most compact and error free. In a word, elegant. I dont' want to fill all my classes with caching dictionaries and repeated structure. – Alberto Chiesa Nov 13 '18 at 16:29
  • 1
    This looks like [Lazy](https://learn.microsoft.com/en-us/dotnet/api/system.lazy-1?view=netcore-2.1) – Silvermind Nov 13 '18 at 16:30
  • 1
    `ConcurrentDictionary` is all you need. Make a key out of your argument, then just do return `internalDictionary.GetOrAdd(argumentKey, internalFunction)` – Ben Nov 13 '18 at 16:30
  • 1
    @Ben You're assuming a requirement of multithreaded access that doesn't appear to be there. – Servy Nov 13 '18 at 16:32
  • @Silvermind, would you mind explaining how you would use `Lazy`? @Ben `ConcurrentDictionary` is similar, but still you need one property (the dictionary), one method(the implementation). Possibly you get rid of the inizialitazion in the constructor? – Alberto Chiesa Nov 13 '18 at 16:35
  • 1
    @Ben Synchronizing access from multiple threads has a cost. Potentially a very high cost. Adding that cost when it's not needed is very ill advised. – Servy Nov 13 '18 at 16:41
  • @Ben, I'm fine using ConcurrentDictionary: it seems reasonable. But I'm really interested in getting the _usage syntax_ right: repeating the same signature 3 times seems just wrong to me. – Alberto Chiesa Nov 13 '18 at 16:41
  • Yes, unfortunately, you need the types. Even using a static, non generic support class with a generic method doesn't get the job done. – Alberto Chiesa Nov 13 '18 at 16:44
  • @Ben I already tried it and it works in this simplistic example, but you can't use non-static properties in the code (no `this`). So it is overly restrictive for real life implementations. – Alberto Chiesa Nov 13 '18 at 16:51
  • 2
    You might look here: https://stackoverflow.com/questions/2852161/c-sharp-memoization-of-functions-with-arbitrary-number-of-arguments – Eric Lippert Nov 13 '18 at 20:17
  • 1
    @Silvermind: `Lazy` is best when you have a single value that must be computed lazily from a `Func` Memoization is used to solve the more general problem of I have a `Func` and wish to make a new `Func` that calls the original if the R for a particular A has not been computed, and then stores the result for next time. – Eric Lippert Nov 13 '18 at 20:18
  • @EricLippert Yes, I was looking into the definition of the term and based upon reading the post I figured I must be missing something. Thanks for the clarification. – Silvermind Nov 13 '18 at 21:31
  • 2
    [This question is being discussed on meta.](https://meta.stackoverflow.com/questions/376646/is-searching-for-duplicates-becoming-a-biased-toward-closing) – Script47 Nov 14 '18 at 14:54
  • If you want to be elegant af, steal the concept of using Castle to proxy a class from Moq. You can intercept actual method calls and property uses and memoize the hell out of it. –  Nov 14 '18 at 15:59
  • @Will: [System.Runtime.Remoting.Proxies.RealProxy](https://learn.microsoft.com/en-us/dotnet/api/system.runtime.remoting.proxies.realproxy) can accomplish that pretty cleanly, but forces you to extract an interface from the relevant classes. If you prefer to do massive setup and then have a generalized, easily used memoize attribute, an alternative approach is to write a post-processing tool that modifies your compiled code to wrap it in Memoizers. The latter approach is pretty straightforward, if you don't mind using expensive (~$600/developer) third-party tools. – Brian Nov 14 '18 at 22:51

2 Answers2

9

In my struggle for elegance, I finally found what I think is the best syntax I saw anywhere:

private class MemoizedTest
{
    private int _counter = 0;

    public int Method(int p) => this.Memoized(p, x =>
    {
        return _counter += x;
    });
}

Implementation (one pretty small extension class):

namespace System
{
    public static class MemoizerExtension
    {
        internal static ConditionalWeakTable<object, ConcurrentDictionary<string, object>> _weakCache =
            new ConditionalWeakTable<object, ConcurrentDictionary<string, object>>();

        public static TResult Memoized<T1, TResult>(
            this object context,
            T1 arg,
            Func<T1, TResult> f,
            [CallerMemberName] string? cacheKey = null)
            where T1 : notnull
        {
            if (context == null) throw new ArgumentNullException(nameof(context));
            if (cacheKey == null) throw new ArgumentNullException(nameof(cacheKey));

            var objCache = _weakCache.GetOrCreateValue(context);

            var methodCache = (ConcurrentDictionary<T1, TResult>) objCache
                .GetOrAdd(cacheKey, _ => new ConcurrentDictionary<T1, TResult>());

            return methodCache.GetOrAdd(arg, f);
        }
    }
}

Explanation In the implementation I'm using a ConditionalWeakTable for caching, effectively extending the internal structure of the object invoking the memoization. As an additional key, the CallerMemberName is used, acting as a second key (this allows more memoizations for instance, and optionally more memoizations per method, if passing the cacheKey parameter explicitly). The third key is the parameter of the invocation.

So, we have 3 runtime dictionary-like searches instead of 1, but a syntax that is a lot cleaner, IMO.

Is it worth it? I dunno, but my desire for elegance is satiated.

If someone else is interested, I'm including the tests for reference:

[TestFixture]
public class MemoizerTest
{
    [Test]
    public void MemoizationWorksOnFuncs()
    {
        int counter = 0;

        Func<int, int> f = x => counter += x;

        Assert.That(this.Memoized(1, f), Is.EqualTo(1));

        Assert.That(this.Memoized(2, f), Is.EqualTo(3));

        Assert.That(this.Memoized(2, f), Is.EqualTo(3));

        Assert.That(this.Memoized(1, f), Is.EqualTo(1));
    }

    private class MemoizedTest
    {
        private int _counter = 0;

        public int Method(int p)
            => this.Memoized(p, x => { return _counter += x; });
    }

    [Test]
    public void MemoizationWorksOnInstances()
    {
        var obj1 = new MemoizedTest();

        Assert.That(obj1.Method(5), Is.EqualTo(5));
        Assert.That(obj1.Method(4), Is.EqualTo(9));
        Assert.That(obj1.Method(5), Is.EqualTo(5));
        Assert.That(obj1.Method(1), Is.EqualTo(10));
        Assert.That(obj1.Method(4), Is.EqualTo(9));

        obj1 = new MemoizedTest();

        Assert.That(obj1.Method(5), Is.EqualTo(5));
        Assert.That(obj1.Method(4), Is.EqualTo(9));
        Assert.That(obj1.Method(5), Is.EqualTo(5));
        Assert.That(obj1.Method(1), Is.EqualTo(10));
        Assert.That(obj1.Method(4), Is.EqualTo(9));
    }

    [Test]
    [Ignore("This test passes only when compiled in Release mode")]
    public void WeakMemoizationCacheIsCleared()
    {
        var obj1 = new MemoizedTest();

        var r1 = obj1.Method(5);

        MemoizerExtension._weakCache.TryGetValue(obj1, out var cache);

        var weakRefToCache = new WeakReference(cache);

        cache = null;
        GC.Collect(2);

        obj1 = null;

        GC.Collect();
        GC.Collect();

        var msg = weakRefToCache.TrackResurrection;

        Assert.That(weakRefToCache.IsAlive, Is.False, "The weak reference should be dead.");

        Assert.That(r1, Is.EqualTo(5));
    }
}
orad
  • 15,272
  • 23
  • 77
  • 113
Alberto Chiesa
  • 7,022
  • 2
  • 26
  • 53
5

If you capture the dictionary in the lambda, your state will be maintained implicitly.

public class Memoized
{
    // Example with a single parameter. That parameter becomes the key to the dictionary.
    public static Func<T1, TRet> Of<T1, TRet>(Func<T1, TRet> f)
    {
        ConcurrentDictionary<T1, TRet> cache = new ConcurrentDictionary<T1, TRet>();
        return (arg1) => cache.GetOrAdd(arg1, xarg=>f(xarg));
    }
    // Example with two parameters. The key is a tuple, and it must be unpacked before calling func.
    // Three or more parameters generalize from this
    public static Func<T1, T2, TRet> Of<T1, T2, TRet>(Func<T1, T2, TRet> f)
    {
        ConcurrentDictionary<Tuple<T1,T2>, TRet> cache = new ConcurrentDictionary<Tuple<T1, T2>, TRet>();
        return (arg1, arg2) => cache.GetOrAdd(new Tuple<T1,T2>(arg1, arg2), 
            (xarg)=>f(xarg.Item1, xarg.Item2)
            );
    }

}

Usage example:

class Test
{
    public int Method(String s, String s2)
    {
        return 99;
    }
}
class Program
{
    static bool UncachedMethod(int x) { return x > 0; }
    static void Main(string[] args)
    {
        var cached = Memoized.Of<int,bool>(UncachedMethod);

        var exampleCall = cached(44);
        var exampleCall2 = cached(44);

        // Capture a non-static member function
        var x = new Test();
        var cachedX = Memoized.Of<String, String,int>(x.Method);

        var exampleCall3 = cachedX("a","b");

    }
}
Ben
  • 34,935
  • 6
  • 74
  • 113
  • 1
    This is a good solution, but I note that it might be better to use value tuples in C# 7. – Eric Lippert Nov 13 '18 at 20:19
  • Did this post get only the second up-vote ever by @EricLippert? – ChiefTwoPencils Dec 02 '18 at 01:48
  • 1
    All the examples assume at least one argument; what would be a good approach for parameterless calls? – x0n Sep 17 '21 at 19:34
  • @x0n, for parameterless calls you can use Lazy. – Sat Aug 22 '23 at 10:25
  • @Ben, several improvements for the single parameter implementation to support null values for input argument: `public static Func Of( Func f ) { Lazy? nullArgFactory = null; if (default(T1) == null) nullArgFactory = new Lazy(() => f(default!)); #pragma warning disable CS8714 // Note: we are using separate branch for null values. var cache = new ConcurrentDictionary(); #pragma warning restore CS8714 return arg => { if (arg == null) return nullArgFactory!.Value; return cache.GetOrAdd(arg, f); }; }` – Sat Aug 22 '23 at 10:41