Using std::atomic during parallel loop and update if condition meets requirements

Question

Here there is an example code with a std::atomic and a std::for_each. (C++ 17)

#include <iostream>
#include <atomic>
#include <vector>
#include <algorithm>
#include <execution>

int main()
{
    constexpr int vec_size = 2000000;
    std::vector<double> numbers(vec_size, 1.0);
    numbers[vec_size / 2] = 2.0;
    std::atomic<double> max_value{ 0.0 };
    std::for_each(std::execution::par_unseq, numbers.begin(), numbers.end(), [&max_value](double s) {
        if (s > max_value)
        {
            max_value = s;
        }
        });
    std::cout << "Max Value: " << std::fixed << max_value;
    return 0;
}

The condition could be evaluated by one thread, during that, another thread could update the max_value. How to do that correct?

I found this answer from 2013 and I wonder whether this is available built-in with C++17 or C++20 today.

The traditional solution would be to use a [mutex](https://en.cppreference.com/w/cpp/thread/mutex). — Jesper Juhl, Sep 15 '22 at 12:08
The chase for the lock-free unicorn fairy results in a failure more often than not. Only a mutex and a lock will guarantee proper synchronization in this case; but the performance would be likely worse than a simple loop in a single execution thread — Sam Varshavchik, Sep 15 '22 at 12:08
`for_each` is not the tool for this job, and the standard library AFAIK doesn't have it. You can do this yourself though. Create a loop that creates chucks of your vector (iterator pairs). Then create a thread for each pair and have the thread do a linear search for the max value. Once you have the results from all of the threads, do one more linear traversal of those results to get the max of the maxes. This lets you avoid needing any thread synchronization except for waiting for the results of all threads. — NathanOliver, Sep 15 '22 at 12:11
@NathanOliver: I guess that the simplification of the code into a [mre] has accidentally produced code which could now be replaced by [max_element](https://en.cppreference.com/w/cpp/algorithm/max_element), but the actual problem is slightly different and for_each might be a good choice. — Thomas Weller, Sep 15 '22 at 12:13
https://en.cppreference.com/w/cpp/experimental/parallelism/existing — Jesper Juhl, Sep 15 '22 at 12:14
@ThomasWeller Woot. I forgot about `max_element`, I looked up `max` and saw it wasn't parallelized, forgot there was an algorithm version. — NathanOliver, Sep 15 '22 at 12:16
[compare_exchange](https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange) looks close, but only seems to compare for equality/inequality, not for greater. — Thomas Weller, Sep 15 '22 at 12:24
In 2012 [Herb Sutter wrote](https://herbsutter.com/2012/08/31/reader-qa-how-to-write-a-cas-loop-using-stdatomics/) that there is no such thing and he implemented it himself. — Thomas Weller, Sep 15 '22 at 12:35

Maxim Egorushkin · Answer 1 · 2022-09-15T17:28:39.663

You can do non-blocking busy wait to update max_value from multiple threads without race conditions:

// ...
#include <emmintrin.h>

int main() {
    // ...
    std::atomic<double> max_value{ 0.0 };
    auto const update_max_value = [&max_value](double s) {
        for(double expected = max_value.load(std::memory_order_relaxed);
            expected < s && !max_value.compare_exchange_weak(expected, s);)
            _mm_pause(); 
    };
    std::for_each(std::execution::par_unseq, numbers.begin(), numbers.end(), update_max_value);
    // ...
}

compare_exchange_weak reloads max_value into expected if max_value has changed since it was loaded into expected.

Updating max_value from multiple threads is not necessary for this task, you can find the maximum value in the range using any number threads (std::max_element overloads 2 and 4) and only then update max_value. This way only one thread ever loads and updates max_value, so that no std::atomic is necessary:

double max_value = *std::max_element(std::execution::par_unseq, numbers.begin(), numbers.end());

Using std::atomic during parallel loop and update if condition meets requirements

1 Answers1