3

I would like to make the following code thread-safe. Unfortunately, I have tried locking at various levels within this code with no success. The only instance I can seem to achieve thread-safety is to place a lock around the entire loop which effectively makes the Parallel.ForEach no faster (possibly even slower) than just using foreach. The code is relatively/almost safe with no locking. It only seems to show slight variations in the summation of the geneTokens.Value[-1] keys and gtCandidates.Value[-1] keys about once out of every 20 or so executions.

I realize that Dictionary is not thread-safe. However, I cannot change this particular object to a ConcurrentDictionary without taking a major performance hit downstream. I would rather run this part of the code with a regular foreach than change that particular object. However, I am using ConcurrentDictionary to hold the individual Dictionary objects. I have also tried making this change, and it does not solve my race issue.

Here are my Class level variables:

//Holds all tokens derived from each sequence chunk
public static ConcurrentBag<sequenceItem> tokenBag = 
  new ConcurrentBag<sequenceItem>();
public BlockingCollection<sequenceItem> sequenceTokens = new 
  BlockingCollection<sequenceItem>(tokenBag);
public ConcurrentDictionary<string, int> categories = new 
  ConcurrentDictionary<string, int>();
public ConcurrentDictionary<int, Dictionary<int, int>> gtStartingFrequencies = new 
  ConcurrentDictionary<int, Dictionary<int, int>>();
public ConcurrentDictionary<string, Dictionary<int, int>> gtCandidates = new 
  ConcurrentDictionary<string, Dictionary<int, int>>();
public ConcurrentDictionary<string, Dictionary<int, int>> geneTokens = new 
  ConcurrentDictionary<string, Dictionary<int, int>>();

Here is the Parallel.ForEach:

Parallel.ForEach(sequenceTokens.GetConsumingEnumerable(), seqToken =>
{
  lock (locker)
  {
    //Check to see if the Sequence Token is a Gene Token
    Dictionary<int, int> geneTokenFreqs;
    if (geneTokens.TryGetValue(seqToken.text, out geneTokenFreqs))
    { //The Sequence Token is a Gene Token 


      *****************Race Issue Seems To Occur Here**************************** 
      //Increment or create category frequencies for each category provided
      int frequency;
      foreach (int category in seqToken.categories)
      {
        if (geneTokenFreqs.TryGetValue(category, out frequency))
        {   //increment the category frequency, if it already exists
            frequency++;
            geneTokenFreqs[category] = frequency;
        }
        else
        {   //Create the category frequency, if it does not exist
            geneTokenFreqs.Add(category, 1);
        }
      }

      //Update the frequencies total [-1] by the total # of categories incremented.
      geneTokenFreqs[-1] += seqToken.categories.Length;
      ******************************************************************************
    }
    else
    { //The Sequence Token is NOT yet a Gene Token
      //Check to see if the Sequence Token is a Gene Token Candidate yet
      Dictionary<int, int> candidateTokenFreqs;
      if (gtCandidates.TryGetValue(seqToken.text, out candidateTokenFreqs))
      {
        *****************Race Issue Seems To Occur Here****************************
        //Increment or create category frequencies for each category provided
        int frequency;
        foreach (int category in seqToken.categories)
        {
          if (candidateTokenFreqs.TryGetValue(category, out frequency))
          { //increment the category frequency, if it already exists
            frequency++;
            candidateTokenFreqs[category] = frequency;
          }
          else
          { //Create the category frequency, if it does not exist
            candidateTokenFreqs.Add(category, 1);
          }
        }

        //Update the frequencies total [-1] by the total # of categories incremented.
        candidateTokenFreqs[-1] += seqToken.categories.Length;
        *****************************************************************************

        //Only update the candidate sequence count once per sequence
        if (candidateTokenFreqs[-3] != seqToken.sequenceId)
        {
          candidateTokenFreqs[-3] = seqToken.sequenceId;
          candidateTokenFreqs[-2]++;

          //Promote the Token Candidate to a Gene Token, if it has been found >=
          //the user defined candidateThreshold
          if (candidateTokenFreqs[-2] >= candidateThreshold)
          {
            Dictionary<int, int> deletedCandidate;
            gtCandidates.TryRemove(seqToken.text, out deletedCandidate);
            geneTokens.TryAdd(seqToken.text, candidateTokenFreqs);
          }
        }
      }
      else
      {
        //create a new token candidate frequencies dictionary by making 
        //a copy of the default dictionary from
        gtCandidates.TryAdd(seqToken.text, new 
          Dictionary<int, int>(gtStartingFrequencies[seqToken.sequenceId]));
      }
    }
  }
});
Jake Drew
  • 2,230
  • 23
  • 29
  • There are other strange things in this code: how does it allow you to increment `frequency` without first initializing it? – Tudor Oct 07 '12 at 07:38
  • Works just fine since frequency is used as an "out" variable for geneTokenFreqs.TryGetValue(). The only time it is incremented is if the variable exists and is returned from TryGetValue... I assure you the code executes. I have been running it all night :) – Jake Drew Oct 07 '12 at 09:09
  • Ah sorry, I didn't see the `out` part. Then it's ok. – Tudor Oct 07 '12 at 09:15

2 Answers2

0

Clearly one data race comes from the fact that some threads will be adding items here:

geneTokens.TryAdd(seqToken.text, candidateTokenFreqs);

and others will be reading here:

if (geneTokens.TryGetValue(seqToken.text, out geneTokenFreqs))
Tudor
  • 61,523
  • 12
  • 102
  • 142
  • You are correct. I have removed them from above. They were actually there from a prior test run right before I posted this. I had previously and unsuccessfully tried using geneTokens.Contains() with the lock right above geneTokenFreqs = geneTokens[seqToken.text]; That approach did not solve my issue. – Jake Drew Oct 07 '12 at 07:12
  • @Jake Drew: I've made an edit with what I think the problem is. You could try to only lock these portions of the code. – Tudor Oct 07 '12 at 11:24
  • I'm sure how to effectively lock on TryAdd without wrapping a lock around the entire block which leads me back to the original question. I tried the following pattern with no luck as well: lock (locker) { isGeneToken = geneTokens.TryGetValue(seqToken.text, out geneTokenFreqs); } if(isGeneToken) – Jake Drew Oct 07 '12 at 16:58
-1

How i have used Concurrent Dictionary in my Project:

I m putting a flag in the dictionary and checking from another thread , that if flag is there or not. If flag present i do my task accordingly..

For this what i m doing is as :

1) Declaring Concurrent Dictionary 2)Adding the flag using TryADD method 3)trying to retrieve the flat using TryGet Methid.

1) Declaring

  Dim cd As ConcurrentDictionary(Of Integer, [String]) = New ConcurrentDictionary(Of Integer, String)()

2) Adding

If cd.TryAdd(1, "uno") Then
        Console.WriteLine("CD.TryAdd() succeeded when it should have failed")
        numFailures += 1
    End If 

3) Retrieving

 If cd.TryGetValue(1, "uno") Then
        Console.WriteLine("CD.TryAdd() succeeded when it should have failed")
        numFailures += 1
    End If 
Devendra Patel
  • 309
  • 2
  • 10