8

If I have the following code:

var dictionary = new ConcurrentDictionary<int, HashSet<string>>();

foreach (var user in users)
{
   if (!dictionary.ContainsKey(user.GroupId))
   {
       dictionary.TryAdd(user.GroupId, new HashSet<string>());
   }

   dictionary[user.GroupId].Add(user.Id.ToString());
}

Is the act of adding an item into the HashSet inherently thread safe because HashSet is a value property of the concurrent dictionary?

Camilo Terevinto
  • 31,141
  • 6
  • 88
  • 120
Marko
  • 12,543
  • 10
  • 48
  • 58

2 Answers2

7

No. Putting a container in a thread-safe container does not make the inner container thread safe.

dictionary[user.GroupId].Add(user.Id.ToString());

is calling HashSet's add after retrieving it from the ConcurrentDictionary. If this GroupId is looked up from two threads at once this would break your code with strange failure modes. I saw the result of one of my teammates making the mistake of not locking his sets, and it wasn't pretty.

This is a plausible solution. I'd do something different myself but this is closer to your code.

if (!dictionary.ContainsKey(user.GroupId))
{
    dictionary.TryAdd(user.GroupId, new HashSet<string>());
}
var groups = dictionary[user.GroupId];
lock(groups)
{
    groups.Add(user.Id.ToString());
}
Joshua
  • 40,822
  • 8
  • 72
  • 132
  • @Joshua Why do you say that you would do something different? What's wrong with locking the hash set like this? Is there a better way to do it? – Marko Nov 20 '18 at 01:14
  • @Marko At work we have a more powerful primitive to build this from. The locking isn't bad, it's just slightly suboptimal and a lot easier to understand. – Joshua Nov 20 '18 at 01:23
5

No, the collection (the dictionary itself) is thread-safe, not whatever you put in it. You have a couple of options:

  1. Use AddOrUpdate as @TheGeneral mentioned:

    dictionary.AddOrUpdate(user.GroupId,  new HashSet<string>(), (k,v) => v.Add(user.Id.ToString());
    
  2. Use a concurrent collection, like the ConcurrentBag<T>:

    ConcurrentDictionary<int, ConcurrentBag<string>>
    

Whenever you are building the Dictionary, as in your code, you should be better off accessing it as little as possible. Think of something like this:

var dictionary = new ConcurrentDictionary<int, ConcurrentBag<string>>();
var grouppedUsers = users.GroupBy(u => u.GroupId);

foreach (var group in grouppedUsers)
{
    // get the bag from the dictionary or create it if it doesn't exist
    var currentBag = dictionary.GetOrAdd(group.Key, new ConcurrentBag<string>());

    // load it with the users required
    foreach (var user in group)
    {
        if (!currentBag.Contains(user.Id.ToString())
        {
            currentBag.Add(user.Id.ToString());
        }
    }
}
  1. If you actually want a built-in concurrent HashSet-like collection, you'd need to use ConcurrentDictionary<int, ConcurrentDictionary<string, string>>, and care either about the key or the value from the inner one.
Camilo Terevinto
  • 31,141
  • 6
  • 88
  • 120
  • Does ConcurrentBag behave as HashSet in other words it does not allow for duplicates in the list? – Marko Nov 20 '18 at 00:38
  • @Marko No, it behaves more like a concurrent list – Camilo Terevinto Nov 20 '18 at 00:41
  • @CamiloTerevinto The only problem I see with ConcurrentBag is how can I remove items from the bag? There is no Remove... looks like I need to use TryTake? – Marko Nov 20 '18 at 01:08
  • 2
    The first solution is incorrect. According to the [documentation](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2.addorupdate) *"the updateValueFactory delegate is called outside the locks, to avoid the problems that can arise from executing unknown code under a lock."* So the first solution could potentially lead to corruption of the `HashSet`, or to lost updates. – Theodor Zoulias Sep 11 '20 at 01:54
  • 1
    @MichaelRandall I am reviewing the questions in the tag [concurrentdictionary](https://stackoverflow.com/questions/tagged/concurrentdictionary). Btw [here](https://referencesource.microsoft.com/mscorlib/system/Collections/Concurrent/ConcurrentDictionary.cs.html#490ee025e9652769) is the source code of the `AddOrUpdate` method. Not only the `updateValueFactory` is invoked without synchronization, but it can also be invoked multiple times in case of contention. – Theodor Zoulias Sep 11 '20 at 02:17
  • 2
    The second solution is incorrect too. The `Contains`+`Add` combination is not atomic. The `Contains` is not even thread-safe, because *"members accessed through one of the interfaces the ConcurrentBag implements, including extension methods, are not guaranteed to be thread safe"* ([reference](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentbag-1#thread-safety)). Besides that, `Contains` is an O(N) operation, so not suitable for replacing a `HashSet`. – Theodor Zoulias Sep 11 '20 at 02:29