Data race in parallelized std::for_each

Question

On the cpp reference website on execution policy there is an example like this:

std::atomic<int> x{0};
int a[] = {1,2};
std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
  x.fetch_add(1, std::memory_order_relaxed);
  while (x.load(std::memory_order_relaxed) == 1) { } // Error: assumes execution order
});

As you see it is an example of (supposedly) erroneous code. But I do not really understand what the error is here, it does not seem to me that any part of the code assumes the execution order. AFAIK, the first thread to fetch_add will wait for the second one but that's it, no problematic behaviour. Am I missing something and there is some error out there?

Found a [duplicate](https://stackoverflow.com/q/58287969/7699037) — Mike van Dyke, Oct 30 '19 at 11:41
Does this answer your question? [Example of misuse of std::memory\_order::relaxed in C++ Standard \[algorithms.parallel.exec/5 in n4713\]](https://stackoverflow.com/questions/58287969/example-of-misuse-of-stdmemory-orderrelaxed-in-c-standard-algorithms-para) — bartop, Oct 30 '19 at 11:43

Moshe Gottlieb · Accepted Answer · 2019-10-30T15:25:55.367

3

The execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized. The invocations of element access functions in parallel algorithms invoked with this policy (usually specified as std::execution::par) are permitted to execute in either the invoking thread or in a thread implicitly created by the library to support parallel algorithm execution. Any such invocations executing in the same thread are indeterminately sequenced with respect to each other.

As far as I can see, the issue here is that there is no guarantee on how many threads are used, if the system uses a single thread - there's going to be an endless loop here (while (x.load(std::memory_order_relaxed) == 1) { } never completes).
So I guess the comment means that this codes wrongfully relies on multiple threads executing which would cause fetch_add to be called at some point more than once.
The only guarantee you get is that for each thread, the invocations are not interleaved.

edited Oct 30 '19 at 15:25

answered Oct 30 '19 at 11:22

Moshe Gottlieb

3,963
25
41

I think you're correct. At least [this answer](https://stackoverflow.com/a/58288373/7699037) states the same. – Mike van Dyke Oct 30 '19 at 11:42
( **+1** for the lovely cookie-policy banner ) Does the "*The implementation runs the algorithm (`for_each`) in the* **`par` policy**," actually mean, that the implementation of the said "*`par`* policy is at most a "*just*"-`[CONCURRENT]` process execution, but **principally not** the True-**`[PARALLEL]`**? ... https://stackoverflow.com/revisions/27347539/3 – user3666197 Oct 30 '19 at 12:42
This is still wrong: there is no guarantee that any two element access functions are executed in the same thread **or** that they are executed in different threads. (The first non-guarantee requires the atomics; the second invalidates the `while`-wait.) – Davis Herring Oct 30 '19 at 13:40
1

@DavisHerring I thought that was implied from what I wrote, what did I get wrong? the implementation _may_ use whatever threads it wants to, in whatever order it feels like, but it must still make sure the invocations are not concurrent (indeterminately sequenced). – Moshe Gottlieb Oct 30 '19 at 13:46
@MosheGottlieb: But they can be concurrent! (You’re right that this isn’t quite implied by “might be in different threads”.) It’s only the ones that happen to be on the same thread that are (even) indeterminately sequenced. – Davis Herring Oct 30 '19 at 13:52
@DavisHerring So I guess my original answer was correct after all :-) – Moshe Gottlieb Oct 30 '19 at 15:26

Data race in parallelized std::for_each

1 Answers1