1

I am trying to develop the following lock free code (c++11):

int val_max;
std::array<std::atomic<int>, 255> vector;

    if (vector[i] > val_max) {
    val_max =vector[i];
    }

The problem is that when there are many threads (128 threads), the result is not correct, because if for example val_max=1, and three threads ( with vector[i] = 5, 15 and 20) will execute the code in the next line, and it will be a data race..

So, I don't know the best way to solve the problem with the available functions in c++11 (I cannot use mutex, locks), protecting the whole code.

Any suggestions? Thanks in advance!

Javi
  • 11
  • 2
  • Describe what you want to achieve. – chill Dec 11 '13 at 15:59
  • 1
    Perhaps this will be solve your problem http://stackoverflow.com/questions/16190078/how-to-atomically-update-a-maximum-value – chill Dec 11 '13 at 16:05
  • 1
    The bigger problem is that even if you get rid of the data race, the contention on `val_max` will become a bottleneck and kill all performance gains that you might have expected from a multithreaded solution. Your algorithm seems flawed, but since you did not write what you were trying to achieve it is hard to assist you in fixing it. – ComicSansMS Dec 11 '13 at 16:18

2 Answers2

3

You need to describe the bigger problem you need to solve, and why you want multiple threads for this.

If you have a lot of data and want to find the maximum, and you want to split that problem, you're doing it wrong. If all threads try to access a shared maximum, not only is it hard to get right, but by the time you have it right, you have fully serialized accesses, thus making the entire thing an exercise in adding complexity and thread overhead to a serial program.

The correct way to make it parallel is to give every thread a chunk of the array to work on (and the array members are not atomics), for which the thread calculates a local maximum, then when all threads are done have one thread find the maximum of the individual results.

Sebastian Redl
  • 69,373
  • 8
  • 123
  • 157
  • As I wrote before, it is just a requirement (number of threads, and lock free). As you said, I gave to each thread a chunk of the array, but the problem is that, when there are 128 threads executing sometimes they execute the same line, and I have the wrong value... – Javi Dec 11 '13 at 17:51
  • 1
    They key is to give them local maxima, and at the end aggregate. If you can't have a separate thread to sum this up, you need to make the final max_val atomic and use a compare_exchange loop to set it. – Sebastian Redl Dec 11 '13 at 18:02
1
  1. Do an atomic fetch of val_max.
  2. If the value fetched is greater than or equal to vector[i], stop, you are done.
  3. Do an atomic compare exchange -- compare val_max to the value you read in step 1 and exchange it for the value of vector[i] if it compares.
  4. If the compare succeeded, stop, you are done.
  5. Go to step 1, you raced with another thread that made forward progress.
David Schwartz
  • 179,497
  • 17
  • 214
  • 278