Can C++ evaluation order optimizations imply using diferent cores for diferent operands?

Question

C++ expressions do not define an order of evaluation for the operands. This is for the sake of potential optimizations.

for the very simple case :

int i = f() + g();

Does such optimizations include evaluating f() and g() on different cores ? and if such optimizations are possible, does it mean that the order of evaluation is runtime dependent ?

This depends on the content of `f()` and `g()`. If they spawn threads or processes internally then it very much is. Can you elaborate more on what the contents of `f` and `g` are? — Fantastic Mr Fox, May 27 '20 at 14:36
my question is at a general level, as the rule of c++ is : order is undefined, whatever the operands are. So what I wish to know is what kind of optimizations the implementation may or may not chose to perform. More specifically : can the expression be split over the available cores ? — chetzacoalt, May 27 '20 at 14:48
@chetzacoalt They won't use different cores simultaneously unless they do something that can use different cores. A simple case I can think of where the compiler might change up the order is `f(g()) + g()` if the compiler can prove `g()` has no side effects. It might elide one invocation, which would necessarily compute the right operand before the left. — cdhowie, May 27 '20 at 14:52
I believe that it could in principle, on a hypothetical system that supported multiple cores per thread, but I also think that it would require a lot of synchronisation (every memory access is now a potential race condition) and thus very likely be a pessimisation. — molbdnilo, May 27 '20 at 14:53
No, threads would have to be synchronized in order to continue execution. And synchronization is costly. That would be an anti-optimization in 99% of cases. So that's a pragmatic reason. Whether the standard allows it is a different story. Not sure. — freakish, May 27 '20 at 14:54
@cdhowie thanks, I see that case would explain why choosing a specific order. but that does not imply any runtime dependence, as the chosen order would always be "g before f". Am I right ? — chetzacoalt, May 27 '20 at 14:55
@chetzacoalt Right. Assuming that the compiler elides a call to `g()` then the RHS would have to be computed before the LHS because `f` can't be called before `g` is. This is why the standard doesn't require a specified order. — cdhowie, May 27 '20 at 14:57
@freakish I agree that f and g should be "proven" or detected to be time consuming and without side effect on each other for optimization to be fruitful. nevertheless.. it may be the case. — chetzacoalt, May 27 '20 at 14:58
@chetzacoalt There is no reason for `g` to be proven time-consuming to have a redundant call optimized away. A redundant call is a redundant call. — cdhowie, May 27 '20 at 14:58
@chetzacoalt it is extremely difficult (if possible at all) for a static compiler to predict how long will a call take. Maybe except some trivial cases like sum. But for example no syscall can be assumed to be fast or slow. Even worse: f and g may not be thread safe. That's even harder for a compiler to detect. Now that I think about it, thread safety is a serious issue. The standard should not allow implicit threading. — freakish, May 27 '20 at 16:24

score 1 · Accepted Answer · answered May 27 '20 at 16:12

Does such optimizations include evaluating f() and g() on different cores?

Yes, even if I doubt it was the case in practice:

starting thread is costly
threading code has constraint and data race and other threading issue should not be introduced

More probable optimization is with inlining and reordering instruction (some value can already be in register, in cache, ...).

and if such optimizations are possible, does it mean that the order of evaluation is runtime dependent?

We can read in evaluation_order

Order of evaluation of any part of any expression, including order of evaluation of function arguments is unspecified (with some exceptions listed below). The compiler can evaluate operands and other subexpressions in any order, and may choose another order when the same expression is evaluated again.

Order of evaluation might change at any evaluation, so might depend of runtime.

score 0 · Answer 2 · answered May 27 '20 at 16:40

The word "to imply" has a very clear and technical meaning, and its use in everyday language matches the technical language (so long as we don't let the populace stomp the reason out, that is). Implication means that "if A, then B too". This means "if A, then always B too". It doesn't mean "when the weather's good" :)

There's no implication as stated, since here, A is "evaluation order optimizations" and B is "using different cores for different operands". And evaluation order optimizations almost never lead to use of different cores, although they may well lead to use of parallel execution units within a single pseudo-serial thread of execution. Modern CPUs already do a lot of parallelization automatically, and a good code generator can really allow the parallel execution units to shine (ahem, get hot).

Now, if what you ask is whether the operands could be evaluated on separate cores: in general - NO. Such transformation would require that the operands are mutually thread-safe, i.e. that they cannot, ever, in any circumstances, modify shared state, since that's clear undefined behavior.

Compilers can in - in limited circumstances - prove that the operands in fact don't modify shared state. They have to do such "reasoning" to do everyday optimizations. Alias analysis is one example of this. That's a positive.
Given the cost of multi-thread dispatch, the evaluation of the operands would require a substantial amount of work to be worth dispatching to worker threads. So, the compiler would need to "prove" that the amount of work to be parallelized is such that the overheads of parallelization won't dwarf the benefits.
The compiler could - in very limited circumstances - prove that mutual exclusion could be added to protect the shared modified state, without introducing deadlocks. Thus, it could add mutexes "on the fly". In practice, those would be spinlocks, as worker dispatch threads shouldn't be stalled (blocked).
Given the overhead of synchronization, the compiler would also need to show that the synchronization is infrequent enough that its overhead would be acceptable.

Doing all of the above well enough to be worth the trouble is still somewhat beyond the means of any single existing production compiler, and is subject of intensive research. There are proofs-of-concept, but nothing in everyday use. This might change quickly, though.

So - at the moment (mid-2020) - the answer is still NO, in practice.

Alas, we really got distracted from the real reason the evaluation order is undefined: it provides the compiler with opportunities to generate better code. Better "serial" code, that is. But this is not quite so: the "serial" code that runs on a single CPU thread is still using parallel execution units. So, in practice, the compiler can and does indeed parallelize the "serial" code - it's just done without involving multiple threads. Reordering of evaluation enables other optimizations that reduce register pressure, improve utilization of the CPU's execution units through better instruction scheduling and vectorization of code, reduce impact of data dependencies, etc.

Can C++ evaluation order optimizations imply using diferent cores for diferent operands?

2 Answers2