Use a Task to avoid multiple calls to expensive operation and to cache its result

Question

I have an async method that fetches some data from a database. This operation is fairly expensive, and takes a long time to complete. As a result, I'd like to cache the method's return value. However, it's possible that the async method will be called multiple times before its initial execution has a chance to return and save its result to the cache, resulting in multiple calls to this expensive operation.

To avoid this, I'm currently reusing a Task, like so:

public class DataAccess
{
    private Task<MyData> _getDataTask;

    public async Task<MyData> GetDataAsync()
    {
        if (_getDataTask == null)
        {
            _getDataTask = Task.Run(() => synchronousDataAccessMethod());
        }

        return await _getDataTask;
    }
}

My thought is that the initial call to GetDataAsync will kick off the synchronousDataAccessMethod method in a Task, and any subsequent calls to this method before the Task has completed will simply await the already running Task, automatically avoiding calling synchronousDataAccessMethod more than once. Calls made to GetDataAsync after the private Task has completed will cause the Task to be awaited, which will immediately return the data from its initial execution.

This seems to be working, but I'm having some strange performance issues that I suspect may be tied to this approach. Specifically, awaiting _getDataTask after it has completed takes several seconds (and locks the UI thread), even though the synchronousDataAccessMethod call is not called.

Am I misusing async/await? Is there a hidden gotcha that I'm not seeing? Is there a better way to accomplish the desired behavior?

EDIT

Here's how I call this method:

var result = (await myDataAccessObject.GetDataAsync()).ToList();

Maybe it has something to do with the fact that the result is not immediately enumerated?

You can just remove the `async` and `await` usages from the `GetDataAsync()` method definition. Does that change anything to your issue? — Jean Hominal, Sep 11 '14 at 21:40
I'd like to keep the method as `async`, since I'll be `await`-ing it further up the callstack. — Nathan Friend, Sep 11 '14 at 21:43
In this case, what I usually do is don't care about this issue. The important thing is that the variable storing the cached result should be atomic, so that it doesn't break by being set multiple times concurrently the first times the method is called. It might happen that the expensive computation is performed multiple times at the beginning, but then it's cached. If the cached value has a long lifetime, this is better than doing expensive synchronization at every call. — Giulio Franco, Sep 11 '14 at 21:44
The method will still return asynchronously! But you will have elided a round of the state machine that underlies the keywords. — Jean Hominal, Sep 11 '14 at 21:45
Could you show us the code that calls the `GetDataAsync` method? — Jean Hominal, Sep 11 '14 at 21:59
@JeanHominal: Edited my answer to show how I'm calling `GetDataAsync`. — Nathan Friend, Sep 11 '14 at 22:08
Could you separate the two operations (`await` and `ToList`) to separate lines? And, for example, use a `Stopwatch` to time them? I suspect that your issue is related to the `ToList` more than the `await`, but I cannot be sure until you test it and report the results back. — Jean Hominal, Sep 11 '14 at 22:18
Yep, the `.ToList` is the culprit. The method call itself returns immediately, but the `.ToList` call takes ~15 seconds. — Nathan Friend, Sep 11 '14 at 22:37
OK, then it is strange - a `ToList` on an in-memory object enumeration of 300 items should not take long. Could you tell us more, in the question, about what `MyData` is and does - e.g., does `MyData` do database accesses as part of an enumeration? what library does `MyData` come from? what is the implementation of `synchronousDataAccessMethod`? — Jean Hominal, Sep 11 '14 at 22:56
However, once you know that information, I would recommend posting a new question - the new question is barely related to the one that you ask here. — Jean Hominal, Sep 11 '14 at 22:57

score 7 · Answer 1 · answered Sep 11 '14 at 21:49

If you want to await it further up the call stack, I think you want this:

public class DataAccess
{
    private Task<MyData> _getDataTask;
    private readonly object lockObj = new Object();

    public async Task<MyData> GetDataAsync()
    {
        lock(lockObj)
        {
            if (_getDataTask == null)
            {
                _getDataTask = Task.Run(() => synchronousDataAccessMethod());
            }
        }
        return await _getDataTask;
    }
}

Your original code has the potential for this happening:

Thread 1 sees that _getDataTask == null, and begins constructing the task
Thread 2 sees that _getDataTask == null, and begins constructing the task
Thread 1 finishes constructing the task, which starts, and Thread 1 waits on that task
Thread 2 finishes constructing a task, which starts, and Thread 2 waits on that task

You end up with two instances of the task running.

But why is the UI thread blocked? And is the OP mistaken when he reports that `synchronousDataAccessMethod` is not called? — Jean Hominal, Sep 11 '14 at 21:56

score 1 · Accepted Answer · answered Sep 11 '14 at 21:41

1

Use the lock function to prevent multiple calls to the database query section. Lock will make it thread safe so that once it has been cached all the other calls will use it instead of running to the database for fulfillment.

lock(StaticObject) // Create a static object so there is only one value defined for this routine
{
    if(_getDataTask == null)
    {
         // Get data code here
    }
    return _getDataTask
}

answered Sep 11 '14 at 21:41

lede

182
2

static object would mean there won't be individual locks for individual instances see Jim Mischel, it should be the accepted answer – jbat100 Feb 27 '19 at 10:59

score 1 · Answer 3 · answered Sep 11 '14 at 21:48

1

Please rewrite your function as:

public Task<MyData> GetDataAsync()
{
    if (_getDataTask == null)
    {
        _getDataTask = Task.Run(() => synchronousDataAccessMethod());
    }

    return _getDataTask;
}

This should not change at all the things that can be done with this function - you can still await on the returned task!

Please tell me if that changes anything.

answered Sep 11 '14 at 21:48

Jean Hominal

16,518
5
56
90

1

You'll need to protect the task creation with a lock. Otherwise you can end up with multiple tasks being created and running simultaneously. – Jim Mischel Sep 11 '14 at 21:50
@Jim: First, I do not think that the task being created multiple times is the OP's problem - he says that the UI thread is locked, meaning that he probably calls the function from the UI thread - in that case, there is no point to a lock. And second - I do not actually care about multiple tasks being created - as it is not part of the problem described by the OP (if multiple tasks were created then the OP should be able to tell that `synchronousDataAccessMethod` was run multiple times). – Jean Hominal Sep 11 '14 at 21:54
@JeanHominal You're right, I can tell that the inner data access function is only called once. I removed `async` and `await` from my method (I didn't know you could do this and still `await` the method later!), but I'm still having the same issue. I think the underlying problem may be unrelated to how I'm calling the method. – Nathan Friend Sep 11 '14 at 21:58

score 0 · Answer 4 · answered Sep 26 '16 at 16:32

Bit late to answer this but there is an open source library called LazyCache that will do this for you in two lines of code and it was recently updated to handle caching Tasks for just this sort of situation. It is also available on nuget.

Example:

    Func<Task<List<MyData>>> cacheableAsyncFunc = () => myDataAccessObject.GetDataAsync();

    var cachedData = await cache.GetOrAddAsync("myDataAccessObject.GetData", cacheableAsyncFunc);

    return cachedData;

    // Or instead just do it all in one line if you prefer
    // return await cache.GetOrAddAsync("myDataAccessObject.GetData", myDataAccessObject.GetDataAsync);
}

It has built in locking by default so the cacheable method will only execute once per cache miss, and it uses a lamda so you can do "get or add" in one go. It defaults to 20 minutes sliding expiration but you can set whatever caching policy you like on it.

More info on caching tasks is in the api docs and you may find the sample app to demo caching tasks useful.

(Disclaimer: I am the author of LazyCache)

Use a Task to avoid multiple calls to expensive operation and to cache its result

4 Answers4