3

I have a ConcurrentDictionary that has as key a long and as value a hashset of int. I want that if the key isn't in the dictionary, add a new hashset with the first element. If the key exists, add the new element to the existing dictionary.

I am trying something like that:

ConcurrentDictionary<long, HashSet<int>> myDic = new ConcurrentDictionary<long, HashSet<int>>();
int myElement = 1;
myDic.AddOrUpdate(1, new Hashset<int>(){myFirstElement},
(key, actualValue) => actualValue.Add(myElement));

The problem with this code is the third parameter, because .Add() method returns a bool and the AddOrUpdate expects a hashset. The first and second parameters are right.

So my question is how I can add a new element to the hashset in thread-safe way and avoid duplicates (it is the reason why I am using a hashset as value). The problem of the hashset is that it is not thread-safe and if I get it first and later add the new element, I am doing outside of the dictionary and I could have problems.

Thanks.

Álvaro García
  • 18,114
  • 30
  • 102
  • 193

2 Answers2

3

To fix compiler error you can do this:

myDic.AddOrUpdate(1, new HashSet<int>() { myFirstElement },
    (key, actualValue) => {
        actualValue.Add(myFirstElement);
        return actualValue;
    });

BUT this is not thread safe, because "update" function is not run inside any lock so you are potentially adding to not-thread-safe HashSet from multiple threads. This might result in (for example) losing values (so you were adding 1000 items to HashSet but in the end you have only 970 items in it for example). Update function in AddOrUpdate should not have any side effects and here it does.

You can lock yourself over adding values to HashSet:

myDic.AddOrUpdate(1, new HashSet<int>() { myFirstElement },
    (key, actualValue) => {
        lock (actualValue) {
            actualValue.Add(myFirstElement);
            return actualValue;
        }
    });

But then question is why you are using lock-free structure (ConcurrentDictionary) in the first place. Besides that - any other code might get HashSet from your dictionary and add value there without any locks, making the whole thing useless. So if you decide to go that way for some reason - you have to ensure that all code locks when accessing HashSet from that dictionary.

Instead of all that - just use concurrent collection instead of HashSet. There is no ConcurrentHashSet as far as I know but you can use another ConcurrentDictionary with dummy keys as a replacement (or look over internet for custom implementations).

Side note. Here

myDic.AddOrUpdate(1, new Hashset<int>(){myFirstElement}, 

you create new HashSet every time when calling AddOrUpdate, even if that dictionary is not needed because key is already there. Instead use overload with add value factory:

myDic.AddOrUpdate(1, (key) => new HashSet<int>() { myFirstElement },

Edit: sample usage of ConcurrentDictionary as hash set:

var myDic = new ConcurrentDictionary<long, ConcurrentDictionary<int, byte>>();
long key = 1;
int element = 1;
var hashSet = myDic.AddOrUpdate(key, 
    _ => new ConcurrentDictionary<int, byte>(new[] {new KeyValuePair<int, byte>(element, 0)}),
    (_, oldValue) => {
        oldValue.TryAdd(element, 0);
        return oldValue;
    });
Evk
  • 98,527
  • 8
  • 141
  • 191
  • Thank you so much for your tips. Could you wirte an example how I could use a CouncurrentDictionary as value with dummy keys? Thanks. – Álvaro García Oct 31 '17 at 10:53
  • In this example. Between I get the hashset and I try to add the new element, someone could delete it, so when I try to add I could get an error., or at least I would add an element to a hashet that there is no in the dictionary. Is it correct? – Álvaro García Oct 31 '17 at 11:39
  • @ÁlvaroGarcía Yes that is correct (no error but you will add to hashset no longer there). If that is a problem for you - I think you can use AddOrUpdate as you did originally (with ConcurrentDictionary instead of HashSet of course). – Evk Oct 31 '17 at 11:51
  • I was thinking that, do the first way with your side note and change dictionary instead of hashset. Thank you so much. – Álvaro García Oct 31 '17 at 11:58
1

If you wrap the anonymous function definition in curly braces, you can define multiple statements in the body of the function and thus specify the return value like this:

myDic.AddOrUpdate(1, new HashSet<int>() { myFirstElement },
(key, actualValue) => {
    actualValue.Add(myElement);
    return actualValue;
});
Conyc
  • 460
  • 1
  • 4
  • 10