C11 memory fence and atomic operation

Question

I'm studying about memory barriers. I have some questions about following code.

//version 1
Thread A:
    *val = 1;
    atomic_thread_fence(memory_order_release);
    atomic_store_explicit(published, 1, memory_order_relaxed);

Thread B:
    if (atomic_load_explicit(published, memory_order_relaxed) == 1) {
            atomic_thread_fence(memory_order_acquire);
            assert(*val == 1); // will never fail
    }

//version 2
/* Thread A */
    *val = 1;
    atomic_thread_fence(memory_order_release);
    *published = 1;

/* Thread B */
    if (*published == 1) {
        atomic_thread_fence(memory_order_acquire);
        assert(*val == 1); /* may fail */
    }

Does atomic_thread_fence only affect atomic loads/stores, and does it have any impact on the compiler or only for cpu?
In version 2, where the store to published is non-atomic, how can it lead to a failed assertion due to the use of atomic_thread_fence, which is only meant for atomic loads/stores?
Why is *val = 1 not written as atomic_store_explicit(val, 1, memory_order_relaxed)?

Is `val` even an `_Atomic` type? It looks like it's the non-atomic "payload" and `published` is the atomic flag that tells other threads it's safe to look at `val`. But yes, you could equally use `atomic_int val` and do `relaxed` stores/loads on it, so any ordering comes from syncs-with on `published` and the fences. Fences do of course have to ensure the necessary compile-time ordering, otherwise they'd be near useless. — Peter Cordes, Apr 05 '23 at 03:52
The *version 2* example is either wrong (assert can't fail if read/write of `*published` is an atomic access), or it's fully undefined behaviour so it's not just that the assert might fail, the standard has nothing to say about what happens before or after that. — Peter Cordes, Apr 05 '23 at 03:54
1. `atomic_thread_fence` does order non-atomic accesses wrt. atomic accesses, but happening to work to make synchronization on non-atomic accesses "work" is only an implementation detail. For possible real-world breakage if version 2 with non-atomic accesses and just compiler barriers, see https://lwn.net/Articles/793253/ (Who's afraid of a big bad optimizing compiler?), written for Linux kernel programming, where they roll their own atomics using `volatile`, inline asm, and compiler barriers, depending on GCC to define the behaviour of all that. — Peter Cordes, Apr 05 '23 at 04:02
Please always specify the type of all your variables when asking programming questions. — curiousguy, Apr 05 '23 at 20:00

score 4 · Answer 1 · answered Apr 05 '23 at 04:24

Fences do affect non-atomic loads and stores. For instance, a load or store, whether atomic or not, must not be reordered before an acquire fence. Otherwise the fence wouldn't be able to establish the necessary synchronization. "Reordered" includes compile-time reordering of instructions in memory, and run-time out-of-order execution; a fence has to inhibit them both.
It's not really that the fence is "only meant for atomic" operations. It's simply that, assuming published is non-atomic in version 2, then you have a data race on published: you have two non-atomic accesses in different threads, at least one of them a write, and no synchronization to make one of them happen-before the other. So the program's behavior is undefined.

The fences aren't a problem here, it's just that they don't do anything to help avoid the data race. Release/acquire fences are only effective when used together with an atomic load that observes the value of an atomic store. In other contexts, they are harmless but also useless.
In version 1, *val is safe to access non-atomically. You have a release fence followed by a store (to published, of the value 1), and a load that, if it observes the store, is followed by an acquire fence. This is exactly the setup of 7.17.4p2 in the C17 standard, so the release fence synchronizes with the acquire fence (assuming that the acquire fence is actually reached). Therefore your store of *val happens-before your load of *val (if the load occurs at all), so there is no data race on *val, and the load is guaranteed to observe the stored value (5.1.2.4p20). There is also no data race on published because it is atomic.

C11 memory fence and atomic operation

1 Answers1