What is the cost of C++ `atomic` for types which the underlying CPU already guarantees are atomic?

Question

Many CPUs guarantee that reads and writes of common size are atomic. E.g. x86 guarantees that a read or write of an aligned 32 bit val is atomic.

Many, many, many programs rely on this (knowingly or unknowingly) and do not use any type of atomic coding other than that.

However, it's considered best C++ practice, especially when portability is important, to not rely on the behavior of the CPU like this, and instead use explicit C++, such as atomic. What is the cost of that? E.g. On x86, what is the cost of atomic int vs regular int?

I imagine that most std::atomic implementations would be smart enough to look at the underlying CPU. Even then, there's probably some cost, as atomics might insert barriers (or, at the least, compile-time barriers), and probably more barriers than are strictly needed for correctness.

If `is_lock_free` is true I would expect the run-time overhead to be zero with possibly a little additional compile-time cost. — NathanOliver, Mar 11 '22 at 17:07
I am not sure that it is just atomicity that matters. The atomic types also guarantee that all reads of the value are up-to-date/properly ordered on all cache lines, do they not? — Joe, Mar 11 '22 at 17:10
@Joe: It can. But it doesn't have to. It depends on the memory ordering you ask for. — Nicol Bolas, Mar 11 '22 at 17:13
Changes to the variable won't become visible until you have some form of memory barrier. The atomic type has all the right barriers inside to make this transparent for the user in the most safe and expensive way per default. You often can relax the memory ordering on operations and use a single atomic variable to safeguard a bunch of others. — Goswin von Brederlow, Mar 11 '22 at 17:16
the thing is even though reads and writes alone are atomic, this does not help much when atomic are used with other variable or when you want to read-modify-write (RMW) a variable. RMW are *not* atomic on x86 like almost all other architecture. This include for example increments/decrements operation, but also the CAS which is pretty useful. — Jérôme Richard, Mar 11 '22 at 17:24
"*x86 guarantees that a read or write of an **aligned** 32 bit val is atomic*" - except that users don't always align their data properly. So using a wrapper like `atomic` can help to force the underlying value to be aligned as needed. — Remy Lebeau, Mar 11 '22 at 17:26
@RemyLebeau AFAIK native variables like a 32-bit is required to be aligned to a 32-bit word in C++ (the alignment requirement is at least the size of the native type). Thus, they are properly aligned and `std::atomic` does not change this point. It may not be fine for smaller/bigger types though. — Jérôme Richard, Mar 11 '22 at 17:40
@JérômeRichard by default, sure, but that behavior can be overridden using alignment-related `#pragma`s, for instance. — Remy Lebeau, Mar 11 '22 at 17:42
@RemyLebeau Indeed, but this is non-standard and and quite unsafe as it break the standard rule that can be assumed by libraries for example (see https://stackoverflow.com/questions/8568432/is-gccs-attribute-packed-pragma-pack-unsafe). — Jérôme Richard, Mar 11 '22 at 17:51
Atomics do more than guaranteeing certain instructions, it also imposes restrictions on compiler optimizations. None of these are "costs", they're __required__ for correctness. As for whether compilers know enough to emit good instructions, yes of course they do. — Passer By, Mar 11 '22 at 17:54
@PasserBy >"None of these are 'costs', they're required for correctness" In a practical sense, many (most?) code simply assumes `int` is atomic, and seems to work fine. Whether that's good luck, tempting fate, or fraught with undiscovered bugs is irrelevant: I'd like to get a sense as to how much slower that code will run if it is modified to use atomics. — SRobertJames, Mar 11 '22 at 19:29
The comparison is pointless, you don't ask about the "cost" of null pointer checks over just letting the program occasionally crashing. It's *not* working fine, unless there wasn't a need for atomics in the first place. Atomic access interferes with your program in many many ways, I don't know of any rule of thumbs of its effect on timing and I doubt there is one. — Passer By, Mar 11 '22 at 19:50
*Many, many, many programs rely on this*: I hope that at minimum they declare the variable as `volatile`. Otherwise, bad things can happen with optimized code. I was using a lot of `volatile` variables in the past, before C++11. But now I know that, except for memory mapped hardware registers, *every time we are tempted to use `volatile`, a `std::atomic` is in fact the correct way*. — prapin, Mar 11 '22 at 19:57
"Many, many, many programs rely on this": Can you give some examples? To my mind, such programs would fall into one of four categories: (a) simply buggy; (b) deliberately reliant on documented or semi-documented behavior of particular compilers with particular compilation flags; (c) very old code written before C++11 and not maintained since then; (d) doesn't actually rely on atomicity the way you think it does. The performance (and correctness) impact of the change to `atomic` would be different between those four cases. — Nate Eldredge, Mar 13 '22 at 16:57
This is because, aside from the actual instructions being used, a major performance impact of using `std::atomic` types is that many optimizations, which would be allowed for non-atomic types, are inhibited. Most of these optimizations would already have broken atomicity or thread safety in one way or another, so if the code used to work then they probably weren't being done - because the compiler is ancient, or because they were inhibited in some other way (e.g. `volatile`), or specific flags being used? — Nate Eldredge, Mar 13 '22 at 17:08
"x86 guarantees that a read or write of an aligned 32 bit val is atomic". Provided that the compiler actually emits a plain load or store instruction to access the variable. For a non-atomic type it would be allowed to do something more "clever" that might be non-atomic. Anyway, you can't do much with loads and stores alone; atomic RMW is almost essential to do anything really interesting. And if your code thought that `int x; x++;` would be an atomic RMW then it was wrong all along; no compiler would ever have emitted `lock add` there. — Nate Eldredge, Mar 13 '22 at 17:12
Another important question: are you just going to change the types, and leave the rest of the code alone? Expressions like `std::atomic a; a = 5;` are guaranteed to have `seq_cst` semantics and so you will get a full memory barrier, which adds cost. Whereas if you rewrote it as `a.store(5, std::memory_order_relaxed)` (assuming it was correct to do so!) then a plain store instruction would suffice. Likewise, if you had `a++` it gets elevated to a `lock add`; if you don't need the read and write done atomically then separate loads and stores might suffice. — Nate Eldredge, Mar 13 '22 at 17:24

What is the cost of C++ `atomic` for types which the underlying CPU already guarantees are atomic?

0 Answers0