33

Background:

I maintain several Winforms apps and class libraries that either could or already do benefit from caching. I'm also aware of the Caching Application Block and the System.Web.Caching namespace (which, from what I've gathered, is perfectly OK to use outside ASP.NET).

I've found that, although both of the above classes are technically "thread safe" in the sense that individual methods are synchronized, they don't really seem to be designed particularly well for multi-threaded scenarios. Specifically, they don't implement a GetOrAdd method similar to the one in the new ConcurrentDictionary class in .NET 4.0.

I consider such a method to be a primitive for caching/lookup functionality, and obviously the Framework designers realized this too - that's why the methods exist in the concurrent collections. However, aside from the fact that I'm not using .NET 4.0 in production apps yet, a dictionary is not a full-fledged cache - it doesn't have features like expirations, persistent/distributed storage, etc.


Why this is important:

A fairly typical design in a "rich client" app (or even some web apps) is to start pre-loading a cache as soon as the app starts, blocking if the client requests data that is not yet loaded (subsequently caching it for future use). If the user is plowing through his workflow quickly, or if the network connection is slow, it's not unusual at all for the client to be competing with the preloader, and it really doesn't make a lot of sense to request the same data twice, especially if the request is relatively expensive.

So I seem to be left with a few equally lousy options:

  • Don't try to make the operation atomic at all, and risk the data being loaded twice (and possibly have two different threads operating on different copies);

  • Serialize access to the cache, which means locking the entire cache just to load a single item;

  • Start reinventing the wheel just to get a few extra methods.


Clarification: Example Timeline

Say that when an app starts, it needs to load 3 datasets which each take 10 seconds to load. Consider the following two timelines:

00:00 - Start loading Dataset 1
00:10 - Start loading Dataset 2
00:19 - User asks for Dataset 2

In the above case, if we don't use any kind of synchronization, the user has to wait a full 10 seconds for data that will be available in 1 second, because the code will see that the item is not yet loaded into the cache and try to reload it.

00:00 - Start loading Dataset 1
00:10 - Start loading Dataset 2
00:11 - User asks for Dataset 1

In this case, the user is asking for data that's already in the cache. But if we serialize access to the cache, he'll have to wait another 9 seconds for no reason at all, because the cache manager (whatever that is) has no awareness of the specific item being asked for, only that "something" is being requested and "something" is in progress.


The Question:

Are there any caching libraries for .NET (pre-4.0) that do implement such atomic operations, as one might expect from a thread-safe cache?

Or, alternatively, is there some means to extend an existing "thread-safe" cache to support such operations, without serializing access to the cache (which would defeat the purpose of using a thread-safe implementation in the first place)? I doubt that there is, but maybe I'm just tired and ignoring an obvious workaround.

Or... is there something else I'm missing? Is it just standard practice to let two competing threads steamroll each other if they happen to both be requesting the same item, at the same time, for the first time or after an expiration?

Aaronaught
  • 120,909
  • 25
  • 266
  • 342
  • I am curious what you mean by not serializing access to the cache. One way or another, access to the same exact resource (be it during initial creation or after expiration) would have to be serialized. It should be possible to serialize access via key rather than to the whole cache, but at some point, some serialization would have to be required...unless I am missing something myself... – jrista Feb 24 '10 at 23:49
  • @jrista: Serialization isn't the only means of thread-safety; there's also reader-writer locks, etc. More to the point, though, a thread-safe library should be handling all of this logic by itself. All cache implementations I've seen are only "thread safe" in the sense of "multiple threads invoking operations at the same time cannot corrupt the cache", but what I want is "built-in support for common multi-step atomic operations, such as lazy-loading of cache items". – Aaronaught Feb 25 '10 at 00:00
  • Well, even in the case of a reader-writer lock...if there is a single writer, everything, all other writers and all readers, are blocked until the single writer releases. Outside of the most basic of operations (such as increment/decrement, exchange, etc. accessible through the Interlocked class), serialization is generally an unavoidable consequence of thread synchronization. You can bury it away and hide it as much as you want, but at some point, it happens. If you want a thread-safe cache that does what you are looking for, it is possible, but you would likely have to write it yourself... – jrista Feb 25 '10 at 03:56
  • ...I think the key you are looking for is finding a way to serialize as few threads as possible. That IS doable, although it is likely not trivial. You do not necessarily need to lock your whole entire cache and serialize every thread that is trying to use it...you just need to find a clever way to synchronize only the threads that are trying to access the same thing in your cache (or as few threads as possible.) You can achieve that in a few ways, such as partitioning, lockable & shared lookup keys (really complicated, but if you figure it out, its the finest grain), etc. – jrista Feb 25 '10 at 04:02
  • @jrista: If I have to write it myself then so be it, I was hoping for alternatives. I don't think it's unreasonable to expect a cache to be able to serialize access to *individual items* as opposed to the entire dictionary, nor is it unusual to want the cache to be able to say, "hey, hang on for a few more seconds, it's already on its way" as opposed to just "yes I have it" or "no I don't have it." – Aaronaught Feb 25 '10 at 04:02
  • @jrista: I don't think it's really that complicated, compared to other aspects of the cache. Instead of simply storing values in the lookup, you store tuples of values and ready-states. If a key exists but is not ready, block on that until it is ready. I could do this, but it's deep in the bowels of any caching implementation, which means I'd have to rewrite the *rest* of it too, including expirations, scavenging, and all those nice things that already exist in libraries like EntLib. – Aaronaught Feb 25 '10 at 04:06
  • @Aaronaught: Oh, I was never saying it was unreasonable. I think it is entirely reasonable. It is just very difficult. I have tried many times to write a coherent, concurrent collection that locked on the finest grain...it is no trivial task, and the closest I have ever come is to use partitioning (break the cache up into multiple partitioned sets via some kind of key or hash, and lock individual partitions.) Partitioning is still not ideal, as you can still serialize a fair amount of threads. Its even possible to serialize all of your threads if they need something in the same partition... – jrista Feb 25 '10 at 04:08
  • ...I think the holy grail of coherent concurrent collections (or rather, hashtable/dictionary/cache) would be to find a way to create and use some kind of singleton key wrapper for each key, which can then be locked. That would allow you to serialize only the threads that need the same object, and ignore any other threads. I haven't figured that one out yet, though. – jrista Feb 25 '10 at 04:10
  • @jrista: It's not really an issue of which threads to serialize. It's simple: (a) store a mutex/event with each value, (b) create the entry immediately in a `GetOrAdd` method, unsignaled, (c) when retrieving an entry, as the very last step, wait until the event is signaled before finally returning the value. It's not as good as `ConcurrentDictionary` but it's good enough. The problem really is not complexity of implementation, it's the amount of time it would take me to implement and debug a feature-complete cache from scratch. – Aaronaught Feb 25 '10 at 04:20
  • Why do you need to reimplement all the features of a robust cache? EntLib Caching AB has a mechanism for hooking into item expiration (`ICacheItemRefreshAction`) that would allow you to keep a tuple in the cache if its composed wait handle (`ManualResetEventSlim`?) is still unsignalled. – G-Wiz Feb 25 '10 at 06:33
  • @Aaronaught It's take 10 seconds to load the data - is this from a slow connection or lots of data? If it's the 2nd then would lazy loading per record be a simpler approach? – Chris S Feb 25 '10 at 11:37
  • @gWiz: I actually got to thinking about this just after I left. It looks like I might be able to wrap the EntLib Cache (unfortunately it wasn't designed for subclassing) and simply change the internal items stored in it. Going to investigate that this morning. I'll update when I have some data on performance/reliability. Unfortunately `ManualResetEventSlim` is only found in .NET 4, I'll have to use a regular `ManualResetEvent`, but this is a cache, not a dictionary, so there shouldn't be millions of entries. – Aaronaught Feb 25 '10 at 14:17
  • @Chris S: The very idea behind caching is that retrieving the data might be slow/expensive. It might not take 10 seconds, but in a networked/distributed application you always want to *minimize* the number of round-trips; loading everything on demand would make the app a lot less responsive. – Aaronaught Feb 25 '10 at 14:19
  • But you've said it's a winforms application not a distributed application have you not? – Chris S Feb 26 '10 at 11:14
  • To clarify what I'm saying @Aaronnaught, I've tried something similar with a mobile application, but then I looked at the competing products that sell in hundreds of thousands and they all simply make you wait instead of being intelligent. That's not to say that's the correct way, just that you can't always predict how the user will be doing their workflow. I'll be interested to see the final product though if you're willing to share it, and it'd be an admirable achievement (and something I'd love to use :P). – Chris S Feb 26 '10 at 12:07
  • @Chris S: I think the lack of closures before .NET 3.5 may factor into the inability of most off-the-shelf cache libraries to lazy-load; even the Framework itself didn't get true concurrent collections until, well, now. I personally am not happy with having a solution that's simply no *worse* than other things out there. And I do have what seems to be a working solution; I'm happy to put it up somewhere once it's been properly documented and stress-tested. (Also, it should work just as well with a distributed cache, doesn't have to be Winforms) – Aaronaught Feb 26 '10 at 15:04
  • I would also appreciate any libraries in .NET 4.0 or later. – Ufuk Hacıoğulları Jul 30 '12 at 21:19

4 Answers4

6

I know your pain as I am one of the Architects of Dedoose. I have messed around with a lot of caching libraries and ended up building this one after much tribulation. The one assumption for this Cache Manager is that all collections stored by this class implement an interface to get a Guid as a "Id" property on each object. Being that this is for a RIA it includes a lot of methods for adding /updating /removing items from these collections.

Here's my CollectionCacheManager

public class CollectionCacheManager
{
    private static readonly object _objLockPeek = new object();
    private static readonly Dictionary<String, object> _htLocksByKey = new Dictionary<string, object>();
    private static readonly Dictionary<String, CollectionCacheEntry> _htCollectionCache = new Dictionary<string, CollectionCacheEntry>();

    private static DateTime _dtLastPurgeCheck;

    public static List<T> FetchAndCache<T>(string sKey, Func<List<T>> fGetCollectionDelegate) where T : IUniqueIdActiveRecord
    {
        List<T> colItems = new List<T>();

        lock (GetKeyLock(sKey))
        {
            if (_htCollectionCache.Keys.Contains(sKey) == true)
            {
                CollectionCacheEntry objCacheEntry = _htCollectionCache[sKey];
                colItems = (List<T>) objCacheEntry.Collection;
                objCacheEntry.LastAccess = DateTime.Now;
            }
            else
            {
                colItems = fGetCollectionDelegate();
                SaveCollection<T>(sKey, colItems);
            }
        }

        List<T> objReturnCollection = CloneCollection<T>(colItems);
        return objReturnCollection;
    }

    public static List<Guid> FetchAndCache(string sKey, Func<List<Guid>> fGetCollectionDelegate)
    {
        List<Guid> colIds = new List<Guid>();

        lock (GetKeyLock(sKey))
        {
            if (_htCollectionCache.Keys.Contains(sKey) == true)
            {
                CollectionCacheEntry objCacheEntry = _htCollectionCache[sKey];
                colIds = (List<Guid>)objCacheEntry.Collection;
                objCacheEntry.LastAccess = DateTime.Now;
            }
            else
            {
                colIds = fGetCollectionDelegate();
                SaveCollection(sKey, colIds);
            }
        }

        List<Guid> colReturnIds = CloneCollection(colIds);
        return colReturnIds;
    }


    private static List<T> GetCollection<T>(string sKey) where T : IUniqueIdActiveRecord
    {
        List<T> objReturnCollection = null;

        if (_htCollectionCache.Keys.Contains(sKey) == true)
        {
            CollectionCacheEntry objCacheEntry = null;

            lock (GetKeyLock(sKey))
            {
                objCacheEntry = _htCollectionCache[sKey];
                objCacheEntry.LastAccess = DateTime.Now;
            }

            if (objCacheEntry.Collection != null && objCacheEntry.Collection is List<T>)
            {
                objReturnCollection = CloneCollection<T>((List<T>)objCacheEntry.Collection);
            }
        }

        return objReturnCollection;
    }


    public static void SaveCollection<T>(string sKey, List<T> colItems) where T : IUniqueIdActiveRecord
    {

        CollectionCacheEntry objCacheEntry = new CollectionCacheEntry();

        objCacheEntry.Key = sKey;
        objCacheEntry.CacheEntry = DateTime.Now;
        objCacheEntry.LastAccess = DateTime.Now;
        objCacheEntry.LastUpdate = DateTime.Now;
        objCacheEntry.Collection = CloneCollection(colItems);

        lock (GetKeyLock(sKey))
        {
            _htCollectionCache[sKey] = objCacheEntry;
        }
    }

    public static void SaveCollection(string sKey, List<Guid> colIDs)
    {

        CollectionCacheEntry objCacheEntry = new CollectionCacheEntry();

        objCacheEntry.Key = sKey;
        objCacheEntry.CacheEntry = DateTime.Now;
        objCacheEntry.LastAccess = DateTime.Now;
        objCacheEntry.LastUpdate = DateTime.Now;
        objCacheEntry.Collection = CloneCollection(colIDs);

        lock (GetKeyLock(sKey))
        {
            _htCollectionCache[sKey] = objCacheEntry;
        }
    }

    public static void UpdateCollection<T>(string sKey, List<T> colItems) where T : IUniqueIdActiveRecord
    {
        lock (GetKeyLock(sKey))
        {
            if (_htCollectionCache.ContainsKey(sKey) == true)
            {
                CollectionCacheEntry objCacheEntry = _htCollectionCache[sKey];
                objCacheEntry.LastAccess = DateTime.Now;
                objCacheEntry.LastUpdate = DateTime.Now;
                objCacheEntry.Collection = new List<T>();

                //Clone the collection before insertion to ensure it can't be touched
                foreach (T objItem in colItems)
                {
                    objCacheEntry.Collection.Add(objItem);
                }

                _htCollectionCache[sKey] = objCacheEntry;
            }
            else
            {
                SaveCollection<T>(sKey, colItems);
            }
        }
    }

    public static void UpdateItem<T>(string sKey, T objItem)  where T : IUniqueIdActiveRecord
    {
        lock (GetKeyLock(sKey))
        {
            if (_htCollectionCache.ContainsKey(sKey) == true)
            {
                CollectionCacheEntry objCacheEntry = _htCollectionCache[sKey];
                List<T> colItems = (List<T>)objCacheEntry.Collection;

                colItems.RemoveAll(o => o.Id == objItem.Id);
                colItems.Add(objItem);

                objCacheEntry.Collection = colItems;

                objCacheEntry.LastAccess = DateTime.Now;
                objCacheEntry.LastUpdate = DateTime.Now;
            }
        }
    }

    public static void UpdateItems<T>(string sKey, List<T> colItemsToUpdate) where T : IUniqueIdActiveRecord
    {
        lock (GetKeyLock(sKey))
        {
            if (_htCollectionCache.ContainsKey(sKey) == true)
            {
                CollectionCacheEntry objCacheEntry = _htCollectionCache[sKey];
                List<T> colCachedItems = (List<T>)objCacheEntry.Collection;

                foreach (T objItem in colItemsToUpdate)
                {
                    colCachedItems.RemoveAll(o => o.Id == objItem.Id);
                    colCachedItems.Add(objItem);
                }

                objCacheEntry.Collection = colCachedItems;

                objCacheEntry.LastAccess = DateTime.Now;
                objCacheEntry.LastUpdate = DateTime.Now;
            }
        }
    }

    public static void RemoveItemFromCollection<T>(string sKey, T objItem) where T : IUniqueIdActiveRecord
    {
        lock (GetKeyLock(sKey))
        {
            List<T> objCollection = GetCollection<T>(sKey);
            if (objCollection != null && objCollection.Count(o => o.Id == objItem.Id) > 0)
            {
                objCollection.RemoveAll(o => o.Id == objItem.Id);
                UpdateCollection<T>(sKey, objCollection);
            }
        }
    }

    public static void RemoveItemsFromCollection<T>(string sKey, List<T> colItemsToAdd) where T : IUniqueIdActiveRecord
    {
        lock (GetKeyLock(sKey))
        {
            Boolean bCollectionChanged = false;

            List<T> objCollection = GetCollection<T>(sKey);
            foreach (T objItem in colItemsToAdd)
            {
                if (objCollection != null && objCollection.Count(o => o.Id == objItem.Id) > 0)
                {
                    objCollection.RemoveAll(o => o.Id == objItem.Id);
                    bCollectionChanged = true;
                }
            }
            if (bCollectionChanged == true)
            {
                UpdateCollection<T>(sKey, objCollection);
            }
        }
    }

    public static void AddItemToCollection<T>(string sKey, T objItem) where T : IUniqueIdActiveRecord
    {
        lock (GetKeyLock(sKey))
        {
            List<T> objCollection = GetCollection<T>(sKey);
            if (objCollection != null && objCollection.Count(o => o.Id == objItem.Id) == 0)
            {
                objCollection.Add(objItem);
                UpdateCollection<T>(sKey, objCollection);
            }
        }
    }

    public static void AddItemsToCollection<T>(string sKey, List<T> colItemsToAdd) where T : IUniqueIdActiveRecord
    {
        lock (GetKeyLock(sKey))
        {
            List<T> objCollection = GetCollection<T>(sKey);
            Boolean bCollectionChanged = false;
            foreach (T objItem in colItemsToAdd)
            {
                if (objCollection != null && objCollection.Count(o => o.Id == objItem.Id) == 0)
                {
                    objCollection.Add(objItem);
                    bCollectionChanged = true;
                }
            }
            if (bCollectionChanged == true)
            {
                UpdateCollection<T>(sKey, objCollection);
            }
        }
    }

    public static void PurgeCollectionByMaxLastAccessInMinutes(int iMinutesSinceLastAccess)
    {
        DateTime dtThreshHold = DateTime.Now.AddMinutes(iMinutesSinceLastAccess * -1);

        if (_dtLastPurgeCheck == null || dtThreshHold > _dtLastPurgeCheck)
        {

            lock (_objLockPeek)
            {
                CollectionCacheEntry objCacheEntry;
                List<String> colKeysToRemove = new List<string>();

                foreach (string sCollectionKey in _htCollectionCache.Keys)
                {
                    objCacheEntry = _htCollectionCache[sCollectionKey];
                    if (objCacheEntry.LastAccess < dtThreshHold)
                    {
                        colKeysToRemove.Add(sCollectionKey);
                    }
                }

                foreach (String sKeyToRemove in colKeysToRemove)
                {
                    _htCollectionCache.Remove(sKeyToRemove);
                }
            }

            _dtLastPurgeCheck = DateTime.Now;
        }
    }

    public static void ClearCollection(String sKey)
    {
        lock (GetKeyLock(sKey))
        {
            lock (_objLockPeek)
            {
                if (_htCollectionCache.ContainsKey(sKey) == true)
                {
                    _htCollectionCache.Remove(sKey);
                }
            }
        }
    }


    #region Helper Methods
    private static object GetKeyLock(String sKey)
    {
        //Ensure even if hell freezes over this lock exists
        if (_htLocksByKey.Keys.Contains(sKey) == false)
        {
            lock (_objLockPeek)
            {
                if (_htLocksByKey.Keys.Contains(sKey) == false)
                {
                    _htLocksByKey[sKey] = new object();
                }
            }
        }

        return _htLocksByKey[sKey];
    }

    private static List<T> CloneCollection<T>(List<T> colItems) where T : IUniqueIdActiveRecord
    {
        List<T> objReturnCollection = new List<T>();
        //Clone the list - NEVER return the internal cache list
        if (colItems != null && colItems.Count > 0)
        {
            List<T> colCachedItems = (List<T>)colItems;
            foreach (T objItem in colCachedItems)
            {
                objReturnCollection.Add(objItem);
            }
        }
        return objReturnCollection;
    }

    private static List<Guid> CloneCollection(List<Guid> colIds)
    {
        List<Guid> colReturnIds = new List<Guid>();
        //Clone the list - NEVER return the internal cache list
        if (colIds != null && colIds.Count > 0)
        {
            List<Guid> colCachedItems = (List<Guid>)colIds;
            foreach (Guid gId in colCachedItems)
            {
                colReturnIds.Add(gId);
            }
        }
        return colReturnIds;
    } 
    #endregion

    #region Admin Functions
    public static List<CollectionCacheEntry> GetAllCacheEntries()
    {
        return _htCollectionCache.Values.ToList();
    }

    public static void ClearEntireCache()
    {
        _htCollectionCache.Clear();
    }
    #endregion

}

public sealed class CollectionCacheEntry
{
    public String Key;
    public DateTime CacheEntry;
    public DateTime LastUpdate;
    public DateTime LastAccess;
    public IList Collection;
}

Here is an example of how I use it:

public static class ResourceCacheController
{
    #region Cached Methods
    public static List<Resource> GetResourcesByProject(Guid gProjectId)
    {
        String sKey = GetCacheKeyProjectResources(gProjectId);
        List<Resource> colItems = CollectionCacheManager.FetchAndCache<Resource>(sKey, delegate() { return ResourceAccess.GetResourcesByProject(gProjectId); });
        return colItems;
    } 

    #endregion

    #region Cache Dependant Methods
    public static int GetResourceCountByProject(Guid gProjectId)
    {
        return GetResourcesByProject(gProjectId).Count;
    }

    public static List<Resource> GetResourcesByIds(Guid gProjectId, List<Guid> colResourceIds)
    {
        if (colResourceIds == null || colResourceIds.Count == 0)
        {
            return null;
        }
        return GetResourcesByProject(gProjectId).FindAll(objRes => colResourceIds.Any(gId => objRes.Id == gId)).ToList();
    }

    public static Resource GetResourceById(Guid gProjectId, Guid gResourceId)
    {
        return GetResourcesByProject(gProjectId).SingleOrDefault(o => o.Id == gResourceId);
    }
    #endregion

    #region Cache Keys and Clear
    public static void ClearCacheProjectResources(Guid gProjectId)
    {            CollectionCacheManager.ClearCollection(GetCacheKeyProjectResources(gProjectId));
    }

    public static string GetCacheKeyProjectResources(Guid gProjectId)
    {
        return string.Concat("ResourceCacheController.ProjectResources.", gProjectId.ToString());
    } 
    #endregion

    internal static void ProcessDeleteResource(Guid gProjectId, Guid gResourceId)
    {
        Resource objRes = GetResourceById(gProjectId, gResourceId);
        if (objRes != null)
        {                CollectionCacheManager.RemoveItemFromCollection(GetCacheKeyProjectResources(gProjectId), objRes);
        }
    }

    internal static void ProcessUpdateResource(Resource objResource)
    {
        CollectionCacheManager.UpdateItem(GetCacheKeyProjectResources(objResource.Id), objResource);
    }

    internal static void ProcessAddResource(Guid gProjectId, Resource objResource)
    {
        CollectionCacheManager.AddItemToCollection(GetCacheKeyProjectResources(gProjectId), objResource);
    }
}

Here's the Interface in question:

public interface IUniqueIdActiveRecord
{
    Guid Id { get; set; }

}

Hope this helps, I've been through hell and back a few times to finally arrive at this as the solution, and for us It's been a godsend, but I cannot guarantee that it's perfect, only that we haven't found an issue yet.

JTtheGeek
  • 1,707
  • 14
  • 17
  • This does indeed looks like it solves the problem at hand, despite being (in my highly biased opinion) not as slick as the one I came up with. ;) Definitely +1 for the implementation and I think I'll change the accepted answer to this one as well, since it handles the deferred loading that's so pivotal to the problem. – Aaronaught Dec 07 '10 at 23:46
  • Lol yeah, I had a few slicker ones before this, but ran into some insane to debug concurrency issues, finally rebuilt the sucker again in the most bulletproof way I possibly could. Hope it helps! – JTtheGeek Dec 07 '10 at 23:50
3

It looks like the .NET 4.0 concurrent collections utilize new synchronization primitives that spin before switching context, in case a resource is freed quickly. So they're still locking, just in a more opportunistic way. If you think you data retrieval logic is shorter than the timeslice, then it seems like this would be highly beneficial. But you mentioned network, which makes me think this doesn't apply.

I would wait till you have a simple, synchronized solution in place, and measure the performance and behavior before assuming you will have performance issues related to concurrency.

If you're really concerned about cache contention, you can utilize an existing cache infrastructure and logically partition it into regions. Then synchronize access to each region independently.

An example strategy if your data set consists of items that are keyed on numeric IDs, and you want to partition your cache into 10 regions, you can (mod 10) the ID to determine which region they are in. You'd keep an array of 10 objects to lock on. All of the code can be written for a variable number of regions, which can be set via configuration, or determined at app start depending on the total number of items you predict/intend to cache.

If your cache hits are keyed in an abnormal way, you'll have to come up with some custom heuristic to partition the cache.

Update (per comment): Well this has been fun. I think the following is about as fine-grained locking as you can hope for without going totally insane (or maintaining/synchronizing a dictionary of locks for each cache key). I haven't tested it so there are probably bugs, but the idea should be illustrated. Track a list of requested IDs, and then use that to decide if you need to get the item yourself, or if you merely need to wait for a previous request to finish. Waiting (and cache insertion) is synchronized with tightly-scoped thread blocking and signaling using Wait and PulseAll. Access to the requested ID list is synchronized with a tightly-scopedReaderWriterLockSlim.

This is a read-only cache. If you doing creates/updates/deletes, you'll have to make sure you remove IDs from requestedIds once they're received (before the call to Monitor.PulseAll(_cache) you'll want to add another try..finally and acquire the _requestedIdsLock write-lock). Also, with creates/updates/deletes, the easiest way to manage the cache would be to merely remove the existing item from _cache if/when the underlying create/update/delete operation succeeds.

(Oops, see update 2 below.)

public class Item 
{
    public int ID { get; set; }
}

public class AsyncCache
{
    protected static readonly Dictionary<int, Item> _externalDataStoreProxy = new Dictionary<int, Item>();

    protected static readonly Dictionary<int, Item> _cache = new Dictionary<int, Item>();

    protected static readonly HashSet<int> _requestedIds = new HashSet<int>();
    protected static readonly ReaderWriterLockSlim _requestedIdsLock = new ReaderWriterLockSlim();

    public Item Get(int id)
    {
        // if item does not exist in cache
        if (!_cache.ContainsKey(id))
        {
            _requestedIdsLock.EnterUpgradeableReadLock();
            try
            {
                // if item was already requested by another thread
                if (_requestedIds.Contains(id))
                {
                    _requestedIdsLock.ExitUpgradeableReadLock();
                    lock (_cache)
                    {
                        while (!_cache.ContainsKey(id))
                            Monitor.Wait(_cache);

                        // once we get here, _cache has our item
                    }
                }
                // else, item has not yet been requested by a thread
                else
                {
                    _requestedIdsLock.EnterWriteLock();
                    try
                    {
                        // record the current request
                        _requestedIds.Add(id);
                        _requestedIdsLock.ExitWriteLock();
                        _requestedIdsLock.ExitUpgradeableReadLock();

                        // get the data from the external resource
                        #region fake implementation - replace with real code
                        var item = _externalDataStoreProxy[id];
                        Thread.Sleep(10000);
                        #endregion

                        lock (_cache)
                        {
                            _cache.Add(id, item);
                            Monitor.PulseAll(_cache);
                        }
                    }
                    finally
                    {
                        // let go of any held locks
                        if (_requestedIdsLock.IsWriteLockHeld)
                            _requestedIdsLock.ExitWriteLock();
                    }
                }
            }
            finally
            {
                // let go of any held locks
                if (_requestedIdsLock.IsUpgradeableReadLockHeld)
                    _requestedIdsLock.ExitReadLock();
            }
        }

        return _cache[id];
    }

    public Collection<Item> Get(Collection<int> ids)
    {
        var notInCache = ids.Except(_cache.Keys);

        // if some items don't exist in cache
        if (notInCache.Count() > 0)
        {
            _requestedIdsLock.EnterUpgradeableReadLock();
            try
            {
                var needToGet = notInCache.Except(_requestedIds);

                // if any items have not yet been requested by other threads
                if (needToGet.Count() > 0)
                {
                    _requestedIdsLock.EnterWriteLock();
                    try
                    {
                        // record the current request
                        foreach (var id in ids)
                            _requestedIds.Add(id);

                        _requestedIdsLock.ExitWriteLock();
                        _requestedIdsLock.ExitUpgradeableReadLock();

                        // get the data from the external resource
                        #region fake implementation - replace with real code
                        var data = new Collection<Item>();
                        foreach (var id in needToGet)
                        {
                            var item = _externalDataStoreProxy[id];
                            data.Add(item);
                        }
                        Thread.Sleep(10000);
                        #endregion

                        lock (_cache)
                        {
                            foreach (var item in data)
                                _cache.Add(item.ID, item);

                            Monitor.PulseAll(_cache);
                        }
                    }
                    finally
                    {
                        // let go of any held locks
                        if (_requestedIdsLock.IsWriteLockHeld)
                            _requestedIdsLock.ExitWriteLock();
                    }
                }

                if (requestedIdsLock.IsUpgradeableReadLockHeld)
                    _requestedIdsLock.ExitUpgradeableReadLock();

                var waitingFor = notInCache.Except(needToGet);
                // if any remaining items were already requested by other threads
                if (waitingFor.Count() > 0)
                {
                    lock (_cache)
                    {
                        while (waitingFor.Count() > 0)
                        {
                            Monitor.Wait(_cache);
                            waitingFor = waitingFor.Except(_cache.Keys);
                        }

                        // once we get here, _cache has all our items
                    }
                }
            }
            finally
            {
                // let go of any held locks
                if (_requestedIdsLock.IsUpgradeableReadLockHeld)
                    _requestedIdsLock.ExitReadLock();
            }
        }

        return new Collection<Item>(ids.Select(id => _cache[id]).ToList());
    }
}

Update 2:

I misunderstood the behavior of UpgradeableReadLock... only one thread at a time can hold an UpgradeableReadLock. So the above should be refactored to only grab Read locks initially, and to completely relinquish them and acquire a full-fledged Write lock when adding items to _requestedIds.

G-Wiz
  • 7,370
  • 1
  • 36
  • 47
  • Thanks, that's pretty close to the implementation I had in mind; the reason I asked the question is that implementing my own cache also requires me to implement expirations, scavenging, persistence, stream serialization, and all of the other thousands of lines of code you'll find in a production-grade caching implementation, which is what I want to avoid. I'll definitely give you +1 for the implementation; still, the larger problem remains unsolved. – Aaronaught Feb 25 '10 at 04:14
  • I didn't end up using this implementation but you had the best answer, so you get the point! – Aaronaught Mar 04 '10 at 01:17
2

I implemented a simple library named MemoryCacheT. It's on GitHub and NuGet. It basically stores items in a ConcurrentDictionary and you can specify expiration strategy when adding items. Any feedback, review, suggestion is welcome.

Ufuk Hacıoğulları
  • 37,978
  • 12
  • 114
  • 156
0

Finally came up with a workable solution to this, thanks to some dialogue in the comments. What I did was create a wrapper, which is a partially-implemented abstract base class that uses any standard cache library as the backing cache (just needs to implement the Contains, Get, Put, and Remove methods). At the moment I'm using the EntLib Caching Application Block for that, and it took a while to get this up and running because some aspects of that library are... well... not that well-thought-out.

Anyway, the total code is now close to 1k lines so I'm not going to post the entire thing here, but the basic idea is:

  1. Intercept all calls to the Get, Put/Add, and Remove methods.

  2. Instead of adding the original item, add an "entry" item which contains a ManualResetEvent in addition to a Value property. As per some advice given to me on an earlier question today, the entry implements a countdown latch, which is incremented whenever the entry is acquired and decremented whenever it is released. Both the loader and all future lookups participate in the countdown latch, so when the counter hits zero, the data is guaranteed to be available and the ManualResetEvent is destroyed in order to conserve resources.

  3. When an entry has to be lazy-loaded, the entry is created and added to the backing cache right away, with the event in an unsignaled state. Subsequent calls to either the new GetOrAdd method or the intercepted Get methods will find this entry, and either wait on the event (if the event exists) or return the associated value immediately (if the event does not exist).

  4. The Put method adds an entry with no event; these look the same as entries for which lazy-loading has already been completed.

  5. Because the GetOrAdd still implements a Get followed by an optional Put, this method is synchronized (serialized) against the Put and Remove methods, but only to add the incomplete entry, not for the entire duration of the lazy load. The Get methods are not serialized; effectively the entire interface works like an automatic reader-writer lock.

It's still a work in progress, but I've run it through a dozen unit tests and it seems to be holding up. It behaves correctly for both the scenarios described in the question. In other words:

  • A call to long-running lazy-load (GetOrAdd) for key X (simulated by Thread.Sleep) which takes 10 seconds, followed by another GetOrAdd for the same key X on a different thread exactly 9 seconds later, results in both threads receiving the correct data at the same time (10 seconds from T0). Loads are not duplicated.

  • Immediately loading a value for key X, then starting a long-running lazy-load for key Y, then requesting key X on another thread (before Y is finished), immediately gives back the value for X. Blocking calls are isolated to the relevant key.

It also gives what I think is the most intuitive result for when you begin a lazy-load and then immediately remove the key from the cache; the thread that originally requested the value will get the real value, but any other threads that request the same key at any time after the removal will get nothing back (null) and return immediately.

All in all I'm pretty happy with it. I still wish there was a library that did this for me, but I suppose, if you want something done right... well, you know.

Aaronaught
  • 120,909
  • 25
  • 266
  • 342
  • Cool, it would be great to see it if you could find the time/energy/resources to post it somewhere! – G-Wiz Mar 04 '10 at 02:47
  • @gWiz: Where do folks normally host these types of things? I don't normally do the whole open source thing, although this has actually become quite a useful library. The real magic is in the method interception library, which I'm using to decorate methods with a `CacheAttribute` and have it automatically allocate cache slots based on the arguments. I'm happy to put it up somewhere, I just have no experience with source distribution. :P – Aaronaught Mar 04 '10 at 02:58
  • That's a good question. If you want to set it up to have a fully-hosted project infrastructure, there's certainly codeplex.com or sourceforge.net. I believe both use svn and require a bit of work to setup. But if you just want to host the file somewhere for reference (and if you don't have a blog) you can simply upload it to Google Docs (and change the sharing permissions to not require users to sign-in to view it). – G-Wiz Mar 04 '10 at 07:04
  • @Aaronnaught: did you finally upload this somewhere? I find this interesting. – Mauricio Scheffer Sep 23 '10 at 21:18
  • @Aaronnaught It would be great if you could share your solution – anchandra Mar 11 '11 at 14:51