3

I have a list of files, where each file contains a list of Foo data. Now, the same piece of Foo data (eg. Id = 1) might exist in multiple files, but the more recent piece of data would overwrite an existing one.

I'm just reading each piece of data into an in memory collection.

if !cache.HasKey(foo.Id) then Add    
else cache[foo.Id].UpdatedOn < foo.UpdatedOn then Update  
else do nothing

When i'm reading in the files (cause there's a few of em), I'm also using Parallel.ForEach(files, file => { .. });

I'm not sure how I do this.

I was thinking of using a ConcurrentDictionary but I wasn't sure how to do an AddOrUpdate with where clause thingy.

Any suggestions?

Pure.Krome
  • 84,693
  • 113
  • 396
  • 647

2 Answers2

4

You can use a ConcurrentDictionary, like so:

dictionary.AddOrUpdate(foo.Id, foo, (id, existing) => 
    existing.UpdatedOn < foo.UpdatedOn ? foo : existing);

Due to the discussion in the comments below, I will explain why there's no race condition here. This MSDN article discusses how value factories are run, and mentions that:

Therefore, it is not guaranteed that the data that is returned by GetOrAdd is the same data that was created by the thread's valueFactory.

This makes sense, since the designers of the concurrent dictionary didn't want user code to lock the dictionary for who knows how long, rendering it useless. Instead, what AddOrUpdate does is run in two nested loops. Here's some pseudo-code:

do { 
   while (!TryGetValue(key, out value))
       if (TryAdd(key, addValue)) return;
   newValue = updateValueFactory(key, value);
} while (TryUpdate(key, newValue, value));

TryUpdate acquires the lock for the specific bucket, compares the current value to the retrieved value, and only if they match performs the update. If this fails, the outer loop happens again, TryGetValue returns the latest value, the value factory is called again, and so forth.

So it is assured that the value factory will always have the latest value if the update succeeds.

Eli Arbel
  • 22,391
  • 3
  • 45
  • 71
  • A race condition exists - to prevent, wrap the above code with a `lock( foo.Id )` block: "ConcurrentDictionary is designed for multithreaded scenarios. You do not have to use locks in your code to add or remove items from the collection. However, it is always possible for one thread to retrieve a value, and another thread to immediately update the collection by giving the same key a new value." from MSDN "How to: Add and Remove Items from a ConcurrentDictionary" – Moho Feb 19 '14 at 10:45
  • @Moho, what race condition? – Grzenio Feb 19 '14 at 10:46
  • it is possible that two threads with the same `foo.Id` key value execute the `AddOrUpdate` method concurrently. – Moho Feb 19 '14 at 10:47
  • Ha, true. Can't lock on an int. There is a race condition, though. Two threads with same `foo.Id` value but different `foo.UpdatedOn` values. An older `UpdatedOn` could be updated after a later `UpdatedOn` value – Moho Feb 19 '14 at 10:50
  • http://msdn.microsoft.com/en-us/library/dd997369 Read the first paragraph after the example code – Moho Feb 19 '14 at 10:52
  • second paragraph too - "not all methods are atomic, specifically GetOrAdd and AddOrUpdate" – Moho Feb 19 '14 at 10:53
  • Well I tested it and it does in fact prevent a race condition. It will re-execute the `updateValueFactory` delegate if the value is changed during its execution. I'll post test code in an answer – Moho Feb 19 '14 at 11:33
0

Interesting behavior in the ConcurrentDictionary.AddOrUpdate method:

class Program
{
    static void Main( string[] args )
    {
        var cd = new System.Collections.Concurrent.ConcurrentDictionary<int, int>();

        var a = 0;
        var b = 1;
        var c = 2;

        cd[ 1 ] = a;

        Task.WaitAll(
            Task.Factory.StartNew( () => cd.AddOrUpdate( 1, b, ( key, existingValue ) =>
                {
                    Console.WriteLine( "b" );
                    if( existingValue < b )
                    {
                        Console.WriteLine( "b update" );
                        System.Threading.Thread.Sleep( 2000 );
                        return b;
                    }
                    else
                    {
                        Console.WriteLine( "b no change" );
                        return existingValue;
                    }
                } ) ),

            Task.Factory.StartNew( () => cd.AddOrUpdate( 1, c, ( key, existingValue ) =>
            {
                Console.WriteLine( "c start" );
                if( existingValue < c )
                {
                    Console.WriteLine( "c update" );
                    return c;
                }
                else
                {
                    Console.WriteLine( "c no change" );
                    return existingValue;
                }
            } ) ) );

        Console.WriteLine( "Value: {0}", cd[ 1 ] );

        var input = Console.ReadLine();
    }
}

Results:

ConcurrentDictionary.AddOrUpdate Test Output

Moho
  • 15,457
  • 1
  • 30
  • 31