11

I'm using ConcurrentDictionary to cache data with parallel access and sometimes new items can be stored in db and they are not loaded into cache. This is reason why I use GetOrAdd

public User GetUser(int userId)
{
    return _user.GetOrAdd(userId, GetUserFromDb);        
}

private User GetUserFromDb(int userId)
{
    var user = _unitOfWork.UserRepository.GetById(userId);

    // if user is null, it is stored to dictionary

    return user;
}

But how I can check if user was get from db and store user to dictionary only if user is not null?

Possibly I can remove null from ConcurrentDictionary immediately after GetOrAdd but it doesn't look thread safe and it is not very elegant solution. Useless insert and remove from dictionary. Do you have any idea how to do it?

y0j0
  • 3,369
  • 5
  • 31
  • 52
  • 2
    You can `TryRemove` the added value if `GetOrAdd` returns null. Not a very beautiful solution though. – Nikolai Samteladze Jul 25 '15 at 16:45
  • 1
    If the lookup from the database returns null, can the user appear in the database later, is that what you want to handle? If not, why does it matter that the userid is stored in the dictionary if not found, the next time that userid is requested, the dictionary has cached that it does not exist so it does not need to ask the database. – Lasse V. Karlsen Jul 25 '15 at 17:16
  • @y0io what did you end up using? – Nikolai Samteladze Jul 26 '15 at 02:52
  • @Nikolai Samteladze I will return to this problem later, but probably your solution with lock in extension method will win ;) Thank you a lot – y0j0 Jul 26 '15 at 14:43

3 Answers3

3

Here's a hacky solution, I hope something better is possible. Make GetUserFromDb throw if the user is not found. This aborts the store into the dictionary. Make GetUser catch the exception. This is using exceptions for control flow which is not nice.

usr
  • 168,620
  • 35
  • 240
  • 369
3
public User GetUser(int userId)
{
    var user = _user.GetOrAdd(userId, GetUserFromDb);
    if (user == null) _user.TryRemove(userId, out user);    
}

You can also wrap that into an extension method:

public static TValue GetOrAddIfNotNull<TKey, TValue>(
    this ConcurrentDictionary<TKey, TValue> dictionary,
    TKey key, 
    Func<TKey, TValue> valueFactory) where TValue : class
{
    var value = dictionary.GetOrAdd(key, valueFactory);
    if (value == null) dictionary.TryRemove(key, out value);
    return value;
}

Then your code will look like:

public User GetUser(int userId)
{
    var user = _user.GetOrAddIfNotNull(userId, GetUserFromDb)   
}

UPDATE

As per @usr comment, there might be a case when:

  1. Thread 1 executes GetOrAdd, adds null to the dictionary and pauses.
  2. User is added to the database.
  3. Thread 2 executes GetOrAdd and retrieves null from the dictionary instead of hitting the database.
  4. Thread 1 and Thread 2 execute TryRemove and remove record from the dictionary.

With this timing, Thread 2 will get null instead of hitting the database and getting the user record. If this edge case matters to you and you still want to use ConcurrentDictionary, then you can use lock in the extension method:

public static class ConcurrentDictionaryExtensions
{
    private static readonly object myLock = new object();

    public static TValue GetOrAddIfNotNull<TKey, TValue>(
        this ConcurrentDictionary<TKey, TValue> dictionary,
        TKey key, 
        Func<TKey, TValue> valueFactory) where TValue : class
    {
        lock (myLock)
        {
            var value = dictionary.GetOrAdd(key, valueFactory);
            if (value == null) dictionary.TryRemove(key, out value);
            return value;
        }
    }
}
Nikolai Samteladze
  • 7,699
  • 6
  • 44
  • 70
  • thank you, but is it thread safe? take hypothetical situation that after GetOrAdd some other process adds new User to Dictionary then TryRemove removes existing User. Very small chance but it can happen – y0j0 Jul 25 '15 at 17:00
  • 1
    This is racy in the sense that two threads might run GetOrAddIfNotNull and when the user does not exist only one of them might hit the database. Sounds like a benign race. @y0io yes, that's true, too. – usr Jul 25 '15 at 17:00
  • @y0io not sure how another process can add user to dictionary because there is already a user with that `userId` there after `GetOrAdd` executed. Can you please explain? @usr I agree that the second thread might not hit the database. But that assumes that user was added to the database between `GetOrAdd` and `TryRemove` on the first thread. The questions is whether you need that precision and what are the consecuenses of not retrieving this user from the DB. – Nikolai Samteladze Jul 25 '15 at 17:17
  • @Nikolai Samteladze yes you are right .. but another process can get null and will not go to db to look for already added User - next process probably will go to db because null will be already deleted. But I agree that it will be not point and will not influence behavior of system. Maybe solution would be use lock in your extension method, wouldn't? – y0j0 Jul 25 '15 at 17:33
  • yeap, you can, just udpated the answer. What do you think? @usr – Nikolai Samteladze Jul 25 '15 at 17:44
  • 4
    It's no longer a *concurrent* dictionary with this global lock. – usr Jul 25 '15 at 17:47
  • @usr not sure what you mean by that... Can you please explain? I am just bundling 2 atomic operations into 1 atomic operation. – Nikolai Samteladze Jul 25 '15 at 17:49
  • 2
    OK. ConcurrentDictionary allows for parallel operations on independent keys. Now, there is a global lock. Only one thread can access the dictionary at a time. That's safe but does not scale. There is no point in using CD anymore. Dictionary would now be more appropriate under this model. – usr Jul 25 '15 at 18:04
  • What usr said, since ConcurrentDictionary slows down performance, because of internal locking etc. A regular dictionary would have slightly better performance, especially if you're going to use one global lock. – Bauss Jul 25 '15 at 18:05
  • @usr Ok, not I see what you mean. Appreciate the explanation. I am not sure I follow why only one thread can access the dictionary though. Only one thread can access `GetOrAddIfNotNull` operation, but why other threads can't perform other operations on the dictionary (e.g. regular `TryGet`)? – Nikolai Samteladze Jul 25 '15 at 18:08
  • @Bauss I agree that regular `Dictionary` will have less overhead. But then you will have to implement all the other operations that `ConcurrentDictionary` provides in case you need them. – Nikolai Samteladze Jul 25 '15 at 18:12
  • It does not implement anything else that you would not have in a regular `Dictionary`. Common functions in a `ConcurrentDictionary` are `TryAdd`, `TryGetValue`, `TryRemove` and `TryUpdate`, whom can easily be implemented to a regular Dictionary with extension methods. Ex. `TryAdd` could be like `if (dict.ContainsKey(key)) dict.Remove(key); dict.Add(key, value);` etc. In this scenario it doesn't seem like it matter. At last remember a concurrent dictionary is meant to be concurrent. – Bauss Jul 25 '15 at 18:19
  • @Bauss I agree that implementing `ConcurrentDictionary` methods yourself is not a big deal. I am not sure that I follow your "meant to be concurrent" argument. Why the proposed extension method makes the whole dictionary non-concurrent? – Nikolai Samteladze Jul 25 '15 at 18:45
  • Because the global lock makes any thread unable to access the dictionary. Concurrent programming is meant to be non-blocking and lock-less. – Bauss Jul 25 '15 at 18:46
  • @Bauss I understand where you are coming from. `ConcurrentDictionary` locks a bucket and doesn't lock the whole operation. I am not sure why `lock` in this extension method would make the whole dictionary non-concurrent though. Only one thread can access `GetOrAddIfNotNull` operation, but why other threads can't perform other operations on the dictionary (e.g. regular `TryGet`)? – Nikolai Samteladze Jul 25 '15 at 18:54
  • Because you lock the reference to the dictionary and not just the function. GetOrAddIfNotNull is not actually implemented into the dictionary; pardon me if I'm wrong, but I believe that the compiler would rewrite: `_user.GetOrAddIfNotNull(userId, GetUserFromDb)` to `GetOrAddIfNotNull(_user, userId, GetUserFromDb)` – Bauss Jul 25 '15 at 18:57
  • True, this is only locking GetOrAddIfNotNull but that's the only operation being performed here. That's why this amounts to a global lock. – usr Jul 25 '15 at 19:17
  • @Bauss I think `lock (myLock)` locks just the code in brackets... Not sure why would it lock the whole dictionary. May be you can post a link? – Nikolai Samteladze Jul 25 '15 at 19:37
  • Not saying it locks the dictionary. It locks the code path in GetOrAddIfNotNull which is 100% of the code operating on the dictionary. – usr Jul 25 '15 at 20:15
3

I am extending @NikolaiSamteladze solution to include double-checked locking so that other threads can skip acquiring lock after dictionary updation

public static class ConcurrentDictionaryExtensions
{
    private static readonly object myLock = new object();

    public static TValue GetOrAddIfNotNull<TKey, TValue>(
        this ConcurrentDictionary<TKey, TValue> dictionary,
        TKey key,
        Func<TKey, TValue> valueFactory) where TValue : class
    {
        TValue value;
        if (!dictionary.TryGetValue(key, out value))
        {
            lock (myLock)
            {
                value = dictionary.GetOrAdd(key, valueFactory);
                if (value == null) dictionary.TryRemove(key, out value);
            } 
        }
        return value;
    }
}
Abdul Rauf
  • 5,798
  • 5
  • 50
  • 70
  • I've given this +1 but my own (limited) testing suggests it doesn't make a big speed improvement. – debater Sep 25 '17 at 16:36
  • @debater it does make a lot of difference if you are getting values most of the time instead of adding them because it will skip lock acquisition if TryGetValue returns value. This design pattern is used specifically for this purpose. https://en.wikipedia.org/wiki/Double-checked_locking – Abdul Rauf Sep 25 '17 at 17:39