2

According to an article in Jeff Preshing's blog:

A release fence prevents the memory reordering of any read or write which precedes it in program order with any write which follows it in program order.

He also has a great post explaining the differences between release fences and release operations here.

Despite the clear explanations in these blog posts, I'm still confused about how to interpret a release fence call such as std::atomic_thread_fence(std:memory_order_release); in terms of memory operation reordering vs the potential fencing mechanisms which provide the guarantees of a release fence.

Is it that compiler must guarantee to circumvent the possibility of a later write call in the thread preceding the fence statement call when compiling to machine code and the CPU must guarantee the same when processing it?

In other words, when exactly does the fence call go from being a statement to being a guarantee?

And most importantly, is there any chance that either compiler or CPU reordering could reorder write operations which succeed the program order of the fence statement, to precede the fence at execution time during that process?

Josh Hardman
  • 721
  • 6
  • 17
  • 1
    The CPU must guarantee the same. There is no distinction between compiler/CPU optimizations in the standard. I personally think that "reordering" is not the best way to think about multithreading correctness, "happens-before" is much better, being the formal model that C++ standard actually uses. Not sure how exactly that applies here, though. – yeputons Jan 11 '23 at 01:57
  • 1
    Have you checked description at [cppreference](https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence)? e.g.: `all non-atomic and relaxed atomic _stores_ that are _sequenced-before FA_ in thread A will happen-before all non-atomic and relaxed atomic _loads_ from the _same locations_ made in thread B _after FB_` - it says nothing about operations after FA, they can happen to be visible in B but can happen not to. – dewaffled Jan 11 '23 at 02:36
  • and `On x86 (including x86-64), atomic_thread_fence functions issue no CPU instructions` - it is no-op on some architectures. – dewaffled Jan 11 '23 at 02:42
  • 1
    @dewaffled: To be precise, it's a no-op in asm on x86; but it does have to block compile-time reordering, i.e. a compiler barrier. So it is still important to place it correctly in the C++ when you're compiling for a strongly-ordered ISA like x86. (I assume that's what you meant, but wanted to clarify for future readers.) – Peter Cordes Jan 11 '23 at 03:31
  • It's not clear to me what you're actually asking. Do you just want the language lawyer answer as to the conditions under which certain ordering guarantees hold? Or are you asking about the mechanism by which the compiler enforces those guarantees? Or something else? – Brian Bi Jan 11 '23 at 23:20
  • @dewaffled @PeterCordes `std::atomic_thread_fence(std::memory_order::seq_cst)` is not a no-op on x86... it issues an MFENCE. – Humphrey Winnebago Jan 12 '23 at 00:46
  • @HumphreyWinnebago it does not look so - https://godbolt.org/z/GzW4Pbs61 – dewaffled Jan 12 '23 at 01:27
  • @dewaffled That's because you wrote your code wrong. I said to use `std::atomic_thread_fence(std::memory_order::seq_cst)`. See https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence and https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html – Humphrey Winnebago Jan 12 '23 at 02:09
  • 1
    @HumphreyWinnebago: The question is asking about `acquire` and `release` fences. You're correct that `seq_cst` fences need asm instructions on all ISAs, since StoreLoad reordering is very important for performance. Only a very few primitive systems don't normally do that in hardware (and thus need an instruction to block it), e.g. 386 SMP systems (pre-486), and I think some other early sequentially-consistent ISA. So yes, dewaffled should have said `atomic_thread_fence(release)`, because it's not true for `atomic_thread_fence` in general. – Peter Cordes Jan 12 '23 at 05:46
  • @BrianBi I'm trying to clarify what is actually being guaranteed through a better understanding of the typical mechanisms from source code to CPU processing which enforce the minimal guarantee/s of a release fence called through ```atomic_thread_fence```. – Josh Hardman Jan 13 '23 at 04:20
  • For example, what guarantee is there that a compiler optimization doesn't reorder the ```atomic_thread_fence``` call itself? Presumably that's not possible, but if it can, how does the compiler maintain the notion of program order of loads and stores with an out of order fence call? – Josh Hardman Jan 13 '23 at 04:29

1 Answers1

1

Is it that compiler must guarantee to circumvent the possibility of a later write call in the thread preceding the fence statement call when compiling to machine code and the CPU must guarantee the same when processing it?

Certain compiler optimizations must be disabled. The compiler must emit code that prevents certain CPU optimizations, including the necessarily CPU fence instructions. That's what makes it a guarantee...

is there any chance that either compiler or CPU reordering could reorder write operations which succeed the program order of the fence statement, to precede the fence at execution time during that process?

An std::atomic_thread_fence with std::memory_order_release prevents loads and stores before the fence ("before" in program order) from being reordered with any store after the fence (subsequent loads can be reordered before). At execution time there might not be needed an actual memory barrier instruction per se, as long as the guarantee holds.

An std::atomic_thread_fence with std::memory_order_acquire prevents loads and stores after the fence ("after" in program order) from being reordered with any load before the fence (earlier stores can be reordered after). At execution time there might not be needed an actual memory barrier instruction per se, as long as the guarantee holds.

Note this is stricter than std::atomic::store, and std::atomic::load, respectively.

An std::atomic<T>::store with std::memory_order_release prevents loads and stores before from being reordered with just that particular store. Subsequent loads and stores can be reordered before. This is a traditional 1-way release, theoretically. (In practice, a more heavy-handed synchronization than is strictly needed might be used.)

An std::atomic<T>::load with std::memory_order_acquire prevents loads and stores after from being reordered with just that particular load. Earlier loads and stores can be reordered after. This is a traditional 1-way acquire, theoretically.

Humphrey Winnebago
  • 1,512
  • 8
  • 15