What is guaranteed with C++ std::atomic at the programmer level?

Question

I have listened and read to several articles, talks and stackoverflow questions about std::atomic, and I would like to be sure that I have understood it well. Because I am still a bit confused with cache line writes visibility due to possible delays in MESI (or derived) cache coherency protocols, store buffers, invalidate queues, and so on.

I read x86 has a stronger memory model, and that if a cache invalidation is delayed x86 can revert started operations. But I am now interested only on what I should assume as a C++ programmer, independently of the platform.

[T1: thread1 T2: thread2 V1: shared atomic variable]

I understand that std::atomic guarantees that,

(1) No data races occur on a variable (thanks to exclusive access to the cache line).

(2) Depending which memory_order we use, it guarantees (with barriers) that sequential consistency happens (before a barrier, after a barrier or both).

(3) After an atomic write(V1) on T1, an atomic RMW(V1) on T2 will be coherent (its cache line will have been updated with the written value on T1).

But as cache coherency primer mention,

The implication of all these things is that, by default, loads can fetch stale data (if a corresponding invalidation request was sitting in the invalidation queue)

So, is the following correct?

(4) std::atomic does NOT guarantee that T2 won't read a 'stale' value on an atomic read(V) after an atomic write(V) on T1.

Questions if (4) is right: if the atomic write on T1 invalidates the cache line no matter the delay, why is T2 waiting for the invalidation to be effective when does an atomic RMW operation but not on an atomic read?

Questions if (4) is wrong: when can a thread read a 'stale' value and "it's visible" in the execution, then?

I appreciate your answers a lot

Update 1

So it seems I was wrong on (3) then. Imagine the following interleave, for an initial V1=0:

T1: W(1)
T2:      R(0) M(++) W(1)

Even though T2's RMW is guaranteed to happen entirely after W(1) in this case, it can still read a 'stale' value (I was wrong). According to this, atomic doesn't guarantee full cache coherency, only sequential consistency.

Update 2

(5) Now imagine this example (x = y = 0 and are atomic):

T1: x = 1;
T2: y = 1;
T3: if (x==1 && y==0) print("msg");

according to what we've talked, seeing the "msg" displayed on screen wouldn't give us information beyond that T2 was executed after T1. So either of the following executions might have happened:

T1 < T3 < T2
T1 < T2 < T3 (where T3 sees x = 1 but not y = 1 yet)

is that right?

(6) If a thread can always read 'stale' values, what would happen if we took the typical "publish" scenario but instead of signaling that some data is ready, we do just the opposite (delete the data)?

T1: delete gameObjectPtr; is_enabled.store(false, std::memory_order_release);
T2: while (is_enabled.load(std::memory_order_acquire)) gameObjectPtr->doSomething();

where T2 would still be using a deleted ptr until sees that is_enabled is false.

(7) Also, the fact that threads may read 'stale' values means that a mutex cannot be implemented with just one lock-free atomic right? It would require a synch mechanism between threads. Would it require a lockable atomic?

Anthony Williams · Answer 1 · 2020-01-31T15:50:58.963

Yes, there are no data races
Yes, with appropriate memory_order values you can guarantee sequential consistency
An atomic read-modify-write will always occur entirely before or entirely after an atomic write to the same variable
Yes, T2 can read a stale value from a variable after an atomic write on T1

Atomic read-modify-write operations are specified in a way to guarantee their atomicity. If another thread could write to the value after the initial read and before the write of an RMW operation, then that operation would not be atomic.

Threads can always read stale values, except when happens-before guarantees relative ordering.

If a RMW operation reads a "stale" value, then it guarantees that the write it generates will be visible before any writes from other threads that would overwrite the value it read.

Update for example

If T1 writes x=1 and T2 does x++, with x initially 0, the choices from the point of view of the storage of x are:

T1's write is first, so T1 writes x=1, then T2 reads x==1, increments that to 2 and writes back x=2 as a single atomic operation.
T1's write is second. T2 reads x==0, increments it to 1, and writes back x=1 as a single operation, then T1 writes x=1.

However, provided there are no other points of synchronization between these two threads, the threads can proceed with the operations not flushed to memory.

Thus T1 can issue x=1, then proceed with other things, even though T2 will still read x==0 (and thus write x=1).

If there are any other points of synchronization then it will become apparent which thread modified x first, because those synchronization points will force an order.

This is most apparent if you have a conditional on the value read from a RMW operation.

Update 2

If you use memory_order_seq_cst (the default) for all atomic operations you don't need to worry about this sort of thing. From the point of view of the program, if you see "msg" then T1 ran, then T3, then T2.

If you use other memory orderings (especially memory_order_relaxed) then you may see other scenarios in your code.

In this case, you have a bug. Suppose the is_enabled flag is true, when T2 enters its while loop, so it decides to run the body. T1 now deletes the data, and T2 then deferences the pointer, which is a dangling pointer, and undefined behaviour ensues. The atomics don't help or hinder in any way beyond preventing the data race on the flag.
You can implement a mutex with a single atomic variable.

Thanks a lot @Anthony Wiliams for your quick answer. I have updated my question with an example of RMW reading a 'stale' value. Looking at this example, what do you mean by relative ordering and that T2's W(1) will be visible before any writes? Does it mean that once T2 has seen T1's changes it won't read T2's W(1) anymore? — Albert Caldas, Jan 31 '20 at 11:51
So if "Threads can always read stale values" it means that cache coherency is never guaranteed (at least at the c++ programmer level). Could you take a look at my update2 please? — Albert Caldas, Jan 31 '20 at 15:00
Now I see that I should have payed more attention to the language and hardware memory models to fully understand all that, that was the piece I was missing. thanks a lot! — Albert Caldas, Jan 31 '20 at 22:27

score 1 · Answer 2 · answered Jan 31 '20 at 12:37

Regarding (3) - it depends on the memory order used. If both, the store and the RMW operation use std::memory_order_seq_cst, then both operations are ordered in some way - i.e., either the store happens before the RMW, or the other way round. If the store is order before the RMW, then it is guaranteed that the RMW operation "sees" the value that was stored. If the store is ordered after the RMW, it would overwrite the value written by the RMW operation.

If you use more relaxed memory orders, the modifications will still be ordered in some way (the modification order of the variable), but you have no guarantees on whether the RMW "sees" the value from the store operation - even if the RMW operation is order after the write in the variable's modification order.

In case you want to read yet another article I can refer you to Memory Models for C/C++ Programmers.

Thanks for the article, I hadn'd read it yet. Even if it's quite old, it's been useful to put my ideas together. — Albert Caldas, Feb 01 '20 at 16:01
Glad to hear that - this article is a slightly extended and revised chapter from my master's thesis. :-) It focuses on the memory model as introduced C++11; I might update it to reflect the (small) changes introduced in C++14/17. Please let me know if you have any comments or suggestions for improvements! — mpoeter, Feb 01 '20 at 16:59

What is guaranteed with C++ std::atomic at the programmer level?

2 Answers2