3

Although I don't read Itanium assembly, and I don't claim to understand its memory model, I have noticed something very strange and apparently contradictory in one proposed mapping of C/C++ atomics to Itanium.

In C/C++11 mappings to processors the proposed implementation of atomics on Itanium suggests no acquire or release fences:

Consume Fence:  <ignore>
Acquire Fence:  <ignore>
Release Fence:  <ignore>
Acq_Rel Fence:  <ignore>

(What is a consume fence anyway?)

And indeed in that proposal the relaxed atomic loads and stores are never relaxed:

Load Relaxed:   ld.acq
Load Consume:   ld.acq
Load Acquire:   ld.acq
...

We see that all relaxed simple (not RMW) operations are already Acq_Rel in these mappings. But looking at RMW operations with see that the relaxed and non-relaxed operations are different:

Cmpxchg Release:    cmpxchg.rel
Cmpxchg AcqRel:     cmpxchg.rel; mf

Unless the Cmpxchg AcqRel implementation has gratuitous mf (unlikely), it means the acquire behavior is not automatic in cmpxchg.rel.

Shouldn't a relaxed, release-only RMW operation followed by an acquire fence provide at least the guarantees of acq_rel RMW? If so, doesn't that show the proposal is defective?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
  • If you use acq/rel for everything (like x86) then barriers become no-ops except for seq_cst. (Although I'm not 100% sure about barriers being supposed to affect non-atomic operations as well, or if that's just an implementation detail in current compilers). Seems like a weird choice, but maybe a good idea on that microarchitecture family for some reason. – Peter Cordes May 29 '19 at 23:07
  • So what about the `mf` in `cmpxchg.rel; mf`? It is redundant? What does it provide if `cmpxchg` has ack? – curiousguy May 29 '19 at 23:09
  • IDK, I haven't thought about that part yet. – Peter Cordes May 29 '19 at 23:14
  • @PeterCordes "_barriers being supposed to affect non-atomic operations as well_" What in the (C or C++) std supports the idea that thread fences can order two non atomic ops? – curiousguy Nov 19 '19 at 00:15
  • Back in May that was something I wondered about. But now I think atomic_thread_fence doesn't help for non-atomic ops wrt. each other, but yes wrt. the fence in case there's an acq or rel operation on the other side of the fence somewhere in another function (e.g. the caller, or a library function). In general compilers don't usually have a *complete* picture of the source they're compiling, and have to emit function definitions that work for an arbitrary caller. Even with LTO there are some non-inline library functions. So non-atomic ops do sometimes need to be ordered wrt. a fence. – Peter Cordes Nov 19 '19 at 02:25

0 Answers0