Why is std::atomic much slower than volatile bool?

Question

I've been using volatile bool for years for thread execution control and it worked fine

// in my class declaration
volatile bool stop_;

-----------------

// In the thread function
while (!stop_)
{
     do_things();
}

Now, since C++11 added support for atomic operations, I decided to try that instead

// in my class declaration
std::atomic<bool> stop_;

-----------------

// In the thread function
while (!stop_)
{
     do_things();
}

But it's several orders of magnitude slower than the volatile bool!

Simple test case I've written takes about 1 second to complete with volatile bool approach. With std::atomic<bool> however I've been waiting for about 10 minutes and gave up!

I tried to use memory_order_relaxed flag with load and store to the same effect.

My platform:

Windows 7 64-bit
MinGW gcc 4.6.x

What I'm doing wrong?

NB: I know that volatile does not make a variable thread-safe. My question is not about volatile, it's about why atomic is ridiculously slow.

Your old code is incorrect, it "worked fine" by chance. Which is better: correct or fast? — Dietrich Epp, Oct 30 '12 at 09:19
Please post your simple test case (if you want meaningful comments). — CB Bailey, Oct 30 '12 at 09:19
@Mysticial: Unfortunately `volatile` by accident happens to make store and load atomic on x86. — Jan Hudec, Oct 30 '12 at 09:20
@aleguna the answer seems to be "because it does different things than `volatile`". — R. Martinho Fernandes, Oct 30 '12 at 09:20
@R.MartinhoFernandes: But for store and load it does not have to do any different thing *on x86-64*. — Jan Hudec, Oct 30 '12 at 09:21
@Mystical He never claimed that it was thread-safe or atomic. Are you implying that attempting to replace `volatile` with `std::atomic` signifies that misunderstanding? Or is it your roundabout way of saying `it does more so why should it not be slower` — r_ahlskog, Oct 30 '12 at 09:21
@Dietrich Epp, yes but 1 second vs 10 min? How can it be? Which is why I'm wondering if I'm using atimic correctly — , Oct 30 '12 at 09:22
@JanHudec: Loads and stores of word-aligned types are atomic on almost every platform on the planet, `volatile` or no. Perhaps you are thinking about memory ordering semantics, which can be used to construct larger atomic operations? — Dietrich Epp, Oct 30 '12 at 09:22
No. I admit that I jumped to the wrong conclusion. But it's a very common mistake that is made. — Mysticial, Oct 30 '12 at 09:22
Can you show the generated assembly? That should clear things up — harold, Oct 30 '12 at 09:24
@Jan [a quick test](http://ideone.com/UoEdoo) on my Windows 7 64 bit system with MinGW GCC 4.7.2 does not show any atomic operations [on the assembly output](http://pastie.org/5136626) (either that, or I'm missing something crucial here). — R. Martinho Fernandes, Oct 30 '12 at 09:27
@R.MartinhoFernandes: On x86 and x86-64 loads and stores are always atomic, caches are kept coherent. They are not necessarily synchronous, but `volatile` prevents reordering by compiler and IIRC x86 does not do any itself. — Jan Hudec, Oct 30 '12 at 09:40
Could you please post your real test case (the content of `do_things()`, along with the `main()` function). — kennytm, Oct 30 '12 at 09:44
Can you try timing an empty loop "while (___builtin_expect(stop_,0)) { }"? It should be not more than 100ns per iteration, in which case it's an expected timing. Else we'll continue guessing. — bobah, Oct 30 '12 at 09:47
@KennyTM, It's not that easy to do unf as it's a unit test for a large project. But I'll try to extract a self contained test tonight — , Oct 30 '12 at 10:12
Note that C(99?) provides sig_atomic_t. In C it's a type that guarantees write consistency across interupts; POSIX extends the language to cover concurrent writes (eg from threads) as well. I don't know what MinGW does on Windows for it, but it's going to be safe and portable I would have thought, except on the most insane platforms (sig_atomic_t could be a char, if the platform can't provide atomic store/load of a char, well..!) — Nicholas Wilson, Oct 30 '12 at 11:25
@DietrichEpp Yes, for his old code to be correct he should have to acquire a lock on `stop_`. But since it is just a true/false check (vs a value check) I can't see how a "spurious" value could make any difference, as long as the variable settles down to a TRUE/FALSE value. E.g. if it were an `int`, and he were doing `_stop++` on one thread, another thread might see the following number sequence on reading the `_stop` variable, `1`, `9000000`, `2`. Just to say. — bobobobo, May 28 '13 at 23:40

KoKuToru · Accepted Answer · 2012-10-30T12:09:14.133

31

Code from "Olaf Dietsche"

 USE ATOMIC
 real   0m1.958s
 user   0m1.957s
 sys    0m0.000s

 USE VOLATILE
 real   0m1.966s
 user   0m1.953s
 sys    0m0.010s

IF YOU ARE USING GCC SMALLER 4.7

http://gcc.gnu.org/gcc-4.7/changes.html

Support for atomic operations specifying the C++11/C11 memory model has been added. These new __atomic routines replace the existing __sync built-in routines.

Atomic support is also available for memory blocks. Lock-free instructions will be used if a memory block is the same size and alignment as a supported integer type. Atomic operations which do not have lock-free support are left as function calls. A set of library functions is available on the GCC atomic wiki in the "External Atomics Library" section.

So yeah .. only solution is to upgrade to GCC 4.7

edited Oct 30 '12 at 12:09

answered Oct 30 '12 at 11:35

KoKuToru

4,055
2
20
21

1

Well I found the problem (see update), you need to have GCC 4.7+ to have lock-free atomics – KoKuToru Oct 30 '12 at 11:58
Is there a reasonable limit in milliseconds to how much time the boolean value in question can spend in any modern processor's cache line under one cycle? If I have a guarantee that no multiple accesses are happening under 0.1ms, or let's say 0.001ms, is it safe to assume I'll get expected behavior? – MatrixAndrew Jan 19 '16 at 12:35

Olaf Dietsche · Answer 2 · 2022-04-10T12:41:26.460

Since I'm curious about this, I tested it myself on Ubuntu 12.04, AMD 2.3 GHz, gcc 4.6.3.

#if 1
#include <atomic>
std::atomic<bool> stop_(false);
#else
volatile bool stop_ = false;
#endif

int main(int argc, char **argv)
{
    long n = 1000000000;
    while (!stop_) {
        if (--n < 0)
            stop_ = true;
    }

    return 0;
}

Compiled with g++ -g -std=c++0x -O3 a.cpp

Although, same conclusion as @aleguna:

just bool:

real 0m0.004s
user 0m0.000s
sys 0m0.004s

volatile bool:

$ time ./a.out
real 0m1.413s
user 0m1.368s
sys 0m0.008s

std::atomic<bool>:

$ time ./a.out
real 0m32.550s
user 0m32.466s
sys 0m0.008s

std::atomic<int>:

$ time ./a.out
real 0m32.091s
user 0m31.958s
sys 0m0.012s

Update 2022-04-10, AMD Ryzen 3 3200G, g++ 9.3.0:

It looks like atomic has improved a lot in comparison to volatile. I increased the loop counter to 10,000,000,000, to have a more precise picture. Although the magnitude doesn't change by this adjustment:

std::atomic<bool>, std::atomic<int>: ~2.9s
volatile bool: ~5.4s

So, 7-8 years have passed but I just tested your code on ubuntu 20 with g++ 9.4 and the atomic one is 0.773s, the volatile is 0.858s — CaptainCodeman, Apr 10 '22 at 09:14
@CaptainCodeman Thank you for showing the improvements made by g++, I updated the answer. — Olaf Dietsche, Apr 10 '22 at 12:42

score 1 · Answer 3 · answered Dec 17 '12 at 14:23

My guess is that this is an hardware question. When you write volatile you tell the compiler to not assume anything about the variable but as I understand it the hardware will still treat it as a normal variable. This means that the variable will be in the cache the whole time. When you use atomic you use special hardware instructions that probably means that the variable is fetch from the main memory each time it is used. The difference in timing is consistent with this explanation.

No. The cache is always used. – curiousguy Apr 04 '19 at 23:09 — curiousguy, Apr 04 '19 at 23:09

Why is std::atomic much slower than volatile bool?

3 Answers3

Linked