Relaxed atomics in x86

Question

I have a couple of questions regarding relaxed atomics in x86 architecture:

If I understand correctly, all read/writes in types of up to 8 bytes are atomic by default. Thus, will there be any data race in the following code? Meaning, is there any possibility of the reader seeing a "half write"?

#include <iostream>
#include <thread>

uint32_t shared_var = 0;

void reader() {
    while (shared_var != 42) {
        std::cout << "Reader thread: " << shared_var << std::endl;
    }
}

void writer() {
    shared_var = 42;
    std::cout << "Writer thread: " << shared_var << std::endl;
}

int main() {
    std::thread t1(reader);
    std::thread t2(writer);

    t1.join();
    t2.join();

    return 0;
}

If the answer to 1. is no, are there any difference with the following code:

#include <iostream>
#include <thread>
#include <atomic>

std::atomic<uint32_t> shared_var = 0;

void reader() {
    while (shared_var.load(std::memory_order_relaxed) != 42) {
        std::cout << "Reader thread: " << shared_var << std::endl;
    }
}

void writer() {
    shared_var.store(42, std::memory_order_relaxed);
    std::cout << "Writer thread: " << shared_var << std::endl;
}

int main() {
    std::thread t1(reader);
    std::thread t2(writer);

    t1.join();
    t2.join();

    return 0;
}

Can the 42 write in the following code be reorder before the reader? Assuming obviously that the write ocurred before the read.

#include <iostream>
#include <thread>
#include <atomic>

std::atomic<uint32_t> shared_var = 0;

void reader() {
    std::cout << "Reader thread: " << shared_var.load(std::memory_order_acquire) << std::endl;
}

void writer() {
    shared_var.store(42, std::memory_order_relaxed);
    std::cout << "Writer thread: " << shared_var << std::endl;
}

int main() {
    std::thread t1(writer);
    std::thread t2(reader);

    t1.join();
    t2.join();

    return 0;
}

Your first example has UB per the C++ standard. That means it might optimize to an entirely different program even if the underlying calls are naturally atomic. — NathanOliver, May 01 '23 at 12:07
Should we care? If an ordinary write happens to be always atomic, make a guess on how `atomic` could be implemented. — BoP, May 01 '23 at 13:12
"*Thus, will there be any data race in the following code? Meaning, is there any possibility of the reader seeing a "half write"?*" That's not what "data race" means. "Data race" as a term is defined in the C++ standard. "Race condition" is a general term that is similar but less specific to the C++ memory model. — Nicol Bolas, May 01 '23 at 13:31
The processor's notion of "atomic" is not the same as the C++ notion of "atomic". If you write C++ code you need to pay attention to C++'s rules, not the processor's rules. Yes, you should care. There is more to `std::atomic` than preventing tearing. — Pete Becker, May 01 '23 at 13:31
"*Can the 42 write in the following code be reorder before the reader? Assuming obviously that the write ocurred before the read.*" This makes no sense. If the write occurred before the read... then it *occurred before the read*. To reorder the code would cause the write to occur after the read. So it's not clear what this contradiction is intended to mean. — Nicol Bolas, May 01 '23 at 13:37
What the processor guarantees matters if you write assembly directly. If you write in a higher-level language, you do not have many guarantees about the asm the compiler chooses to produce, you can only rely on the language standard. — Marc Glisse, May 01 '23 at 15:55

score 0 · Answer 1 · answered May 01 '23 at 13:42

Thus, will there be any data race in the following code?

Yes.

Meaning, is there any possibility of the reader seeing a "half write"?

That's not what "data race" means. A "data race" is a term defined by the C++ standard; the results of such a situation are undefined behavior. A "half write" is a possible outcome of such UB, but undefined behavior is undefined.

The problem here is not necessarily x86, but your compiler. Because you just used a regular type, the C++ standard says that the compiler is free to assume that the value of that object can only be changed by code the compiler can see. In the reader loop, there are no apparent changes in shared_var, nor are there any apparent synchronization events that would create visibility of such changes from other threads.

Now, because your infinite loop happens to call a bunch of operator<< overloads in the iostream library, it is entirely possible that one or more of these calls invokes synchronization or is opaque to the compiler. As such, the compiler cannot simply turn your while statement into a single if check; it must execute it as written. And thus, your UB is now dependent upon the vagaries of x86 visibility operations.

However, if you had done something that the compiler could see within that loop, and it doesn't see anything that could perform synchronization or visibility operations, then it is well within its rights to optimize it to:

void reader() {
    if(shared_var == 42)
      return;

    while(true)
    {
      //Stuff the compiler can see doesn't do synchronization.
    }
}

Relaxed atomics in x86

1 Answers1