Is using std::atomic_thread_fence right before an atomic load/store with the same order always redundant?

Question

Given:

std::atomic<uint64_t> b;

void f()
{
    std::atomic_thread_fence(std::memory_order::memory_order_acquire);

    uint64_t a = b.load(std::memory_order::memory_order_acquire);

    // code using a...
}

Can removing the call to std::atomic_thread_fence have any effect? If so is there a succinct example? Keeping in mind that other functions may store/load to b and call f.

Using `std::atomic_thread_fence` is redundant in regards to `b` insofar as you're using `std::memory_order_acquire` because that memory order requires that any writes be visible before you do your read. The fence is useful however if you are guarding non-atomic data that could be stale. But there are other issues with that and this code. — Mgetz, Oct 26 '21 at 18:27
An acquire fence has to be sequenced *after* an atomic operation in order to have any effect. So in your example, the fence might be redundant but not because it precedes an atomic operation with acquire ordering. If this doesn't answer your question, please edit your question to be more specific. — Brian Bi, Oct 26 '21 at 18:30
@Mgetz could you give an idea of what you mean by other issues? — Joseph Garvin, Oct 28 '21 at 17:54

Humphrey Winnebago · Accepted Answer · 2021-11-05T06:27:21.330

Never redundant. atomic_thread_fence actually has stricter ordering requirements than a load with mo_acquire. It's poorly documented, but the acquire fence isn't one-way permiable for loads; it preserves Read-Read and Read-Write order between accesses on opposite sides of the fence.

Load-acquires on the other hand only require ordering between that load and subsequent loads and stores. Read-Read and Read-Write order is enforced ONLY between that particular load-acquire. Prior loads/stores (in program order) have no restrictions. Thus the load-acquire is one-way permiable.

The release fence similarly loses one-way permiability for stores, preserving Write-Read and Write-Write. See Jeff Preshing's article https://preshing.com/20130922/acquire-and-release-fences/.

By the way, it looks like you have your fence on the wrong side. See Preshing's other article https://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/. With an acquire-load, the load happens before the acquire, so using fences it would look like this:

uint64_t a = b.load(std::memory_order::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order::memory_order_acquire);

Remember that release doesn't guarantee visibility. All release does is guarantee the order in which writes to different variables become visible in other threads. (Without this, other threads can observe orderings that seem to violate cause-and-effect.)

Here's an example using CppMem tool (http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/). The first thread is SC, so we know the writes occur in that order. So if c==1, then a and b should both be 1 as well. CppMem gives "48 executions; 1 consistent, race free", indicating that it is possible for the 2nd thread to see c==1 && b==0 && a==0. This is because c.load is allowed to be reordered after a.load, permeating past b.load

int main() {
  atomic_int a = 0;
  atomic_int b = 0;
  atomic_int c = 0;

  {{{ {
    a.store(1, mo_seq_cst);
    b.store(1, mo_seq_cst);
    c.store(1, mo_seq_cst);
  } ||| {
    c.load(mo_relaxed).readsvalue(1);
    b.load(mo_acquire).readsvalue(0);
    a.load(mo_relaxed).readsvalue(0);
  } }}}
}

If we replace the acquire-load with an aquire-fence, c.load is not allowed to be reordered after a.load. CppMem gives "8 executions; no consistent" confirming that it is not possible.

int main() {
  atomic_int a = 0;
  atomic_int c = 0;

  {{{ {
    a.store(1, mo_seq_cst);
    c.store(1, mo_seq_cst);
  } ||| {
    c.load(mo_relaxed).readsvalue(1);
    atomic_thread_fence(mo_acquire);
    a.load(mo_relaxed).readsvalue(0);
  } }}}
}

Edit: Improved first example to actually show the variable crossing an acquire operation.

Is using std::atomic_thread_fence right before an atomic load/store with the same order always redundant?

1 Answers1