Constant folding/propagation optimization with memory barriers

Question

I have been reading for a while in order to understand better whats going on when multithread programming with a modern (multicore) CPU. However, while I was reading this, I noticed the code below in the "Explicit Compiler Barriers" section, which does not use volatile for IsPublished global.

#define COMPILER_BARRIER() asm volatile("" ::: "memory")

int Value;
int IsPublished = 0;

void sendValue(int x)
{
    Value = x;
    COMPILER_BARRIER();          // prevent reordering of stores
    IsPublished = 1;
}

int tryRecvValue()
{
    if (IsPublished)
    {
        COMPILER_BARRIER();      // prevent reordering of loads
        return Value;
    }
    return -1;  // or some other value to mean not yet received
}

The question is, is it safe to omit volatile for IsPublished here? Many people mention that "volatile" keyword has nothing much to do with multithread programming and I agree with them. However, during the compiler optimizations "Constant Folding/Propagation" can be applied and as the wiki page shows it is possible to change if (IsPublished) into if (false) if compiler do not knows much about who can change the value of IsPublished. Do I miss or misunderstood something here?

Memory barriers can prevent compiler ordering and out-of-order execution for CPU, but as I said in the previos paragraph do I still need volatile in order to avoid "Constant Folding/Propagation" which is a dangereous optimization especially using globals as flags in a lock-free code?

Your wiki page doesn't state what you claim. A compiler that reduced `IsPublished` to `false` without **complete** knowledge of who can change the values would be severely broken. — user207421, Apr 29 '15 at 11:39
I suggest you use the C++ standard mechanisms for memory ordering if you really have to. Better yet, try to use the higher-level parallellization primitives in C++ and avoid going knee-deep in the really tricky memory ordering/visibility stuff, it is really the rocket science of programming. — Erik Alapää, Apr 29 '15 at 11:40
Are you asking about C or C++? In C++, just make `IsPublished` atomic, and you'll get correct (although somewhat conservative) memory barriers. — Mike Seymour, Apr 29 '15 at 11:42
@EJP: Its not a claim, but what I understood from the example on the wiki. It would be better if you can clarify when this optimization is possible or not. — Deniz, Apr 29 '15 at 11:46

score 0 · Accepted Answer · answered Apr 30 '15 at 08:17

0

If tryRecvValue() is called once, it is safe to omit volatile for IsPublished. The same is true in case, when between calls to tryRecvValue() there is a function call, for which compiler cannot prove, that it does not change false value of IsPublished.

// Example 1(Safe)
int v = tryRecvValue();
if(v == -1) exit(1);

// Example 2(Unsafe): tryRecvValue may be inlined and 'IsPublished' may be not re-read between iterations.
int v;
while(true)
{
    v = tryRecvValue();
    if(v != -1) break;
}

// Example 3(Safe)
int v;
while(true)
{
    v = tryRecvValue();
    if(v != -1) break;
    some_extern_call(); // Possibly can change 'IsPublished'
}

Constant propagation can be applied only when compiler can prove value of the variable. Because IsPublished is declared as non-constant, its value can be proven only if:

Variable is assigned to the given value or read from variable is followed by the branch, executed only in case when variable has given value.
Variable is read (again) in the same program's thread.
Between 2 and 3 variable is not changed within given program's thread.

Unless you call tryRecvValue() in some sort of .init function, compiler will never see IsPublished initialization in the same thread with its reading. So, proving false value of this variable according to its initialization is not possible.

Proving false value of IsPublished according to false (empty) branch in tryRecvValue function is possible, see Example 2 in the code above.

answered Apr 30 '15 at 08:17

Tsyvarev

60,011
17
110
153

Thank you, but I still have blurry points. 1) Why only one (single) call to `tryRecvValue()` is safe? 2) When compiler cannot prove that `some_extern_call()` it does not change the value of `IsPublished`? Ex: When the source code (definition of `some_extern_call()`) is not available or not? 3) All the three conditions should be true in order to prove? 4) What exactly do you mean with "same program's thread", any thread created by the same program (process)? – Deniz Apr 30 '15 at 08:55
1) As compiler cannot prove value of `IsPublished`, it emits instructions to read that value and check result. That is all you need. 2) Yes, if compiler cannot see the code of `some_extern_call()`, it cannot prove variable's value after it. 3) Yes, all three conditions should be true. 4) I mean abstract executor of the program, which executes instructions in program order. You may interpret it as, e.g., posix thread, but such threads do not exist at compile time. – Tsyvarev Apr 30 '15 at 09:25
It is a little bit hard to visualize the rules and do not want to ask every detail, are they come from an algorithm or implementation (compiler specific). May be I can look at different resources that you refer? – Deniz Apr 30 '15 at 10:55
1

I studied memory barriers for Linux kernel: http://www.mjmwired.net/kernel/Documentation/memory-barriers.txt. In that description you can assume ACCESS_ONCE() as access (read or write) to a volatile variable. Actually, if you want just to program such things, the simple rule is to use `volatile` for every variable used as interprocess lock-less communication flag. With appropriate barrier, of cource. Or, better, use corresponded atomic library(std::atomic in c++). This library itself cares of type modificators and barriers. – Tsyvarev Apr 30 '15 at 11:39

Constant folding/propagation optimization with memory barriers

1 Answers1