11

I know that volatile does not enforce atomicity on int for example, but does it if you access a single byte? The semantics require that writes and reads are always from memory if I remember correctly.

Or in other words: Do CPUs read and write bytes always atomically?

Axel Gneiting
  • 5,293
  • 25
  • 30
  • 2
    C1x also provides atomic operations – Christoph Feb 08 '11 at 17:45
  • Yes, I thought that as well. I was just wondering, because the code in this Wikipedia article: http://en.wikipedia.org/wiki/Busy_waiting It's just wrong. – Axel Gneiting Feb 08 '11 at 17:46
  • @Alex Gneiting - It is wrong. :-( – Omnifarious Feb 08 '11 at 19:05
  • Unless you're dealing with the actual code for synchronization primitives, any time you see the keyword "volatile", substitute "bug". It's almost always the wrong way to do things on any particular platform, and it's always the wrong way if you want to be cross-platform. If you want to be safe, never use volatile or assume a certain word size is atomic - just use a library e.g pthreads. – John Ripley Feb 08 '11 at 20:07

5 Answers5

22

Not only does the standard not say anything about atomicity, but you are likely even asking the wrong question.

CPUs typically read and write single bytes atomically. The problem comes because when you have multiple cores, not all cores will see the byte as having been written at the same time. In fact, it might be quite some time (in CPU speak, thousands or millions of instructions (aka, microseconds or maybe milliseconds)) before all cores have seen the write.

So, you need the somewhat misnamed C++0x atomic operations. They use CPU instructions that ensure the order of things doesn't get messed up, and that when other cores look at the value you've written after you've written it, they see the new value, not the old one. Their job is not so much atomicity of operations exactly, but making sure the appropriate synchronization steps also happen.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • +1 for mentioning synchronization. The OP needs to find methods for protecting variables from other cores (CPUs) accessing it. – Thomas Matthews Feb 08 '11 at 17:45
  • Thank you for confirming my assumption that CPUs don't exchange the cache lines when not using a atomic instruction modifier. – Axel Gneiting Feb 08 '11 at 17:47
  • Can you provide a link to these C++0x atomic operations? I found `std::lock_guard`, `std::mutex` and so on, but are these the right thing? – Orion Edwards Feb 08 '11 at 19:43
  • @Orion Edwards - I will try, but I've had difficulty finding good solid documentation for them myself. – Omnifarious Feb 08 '11 at 20:44
  • @Orion Edwards - I made it a StackOverflow question, because I haven't seen it asked: http://stackoverflow.com/questions/4938258/where-can-i-find-good-solid-documentation-for-the-c0x-synchronization-primitiv – Omnifarious Feb 08 '11 at 20:50
  • @Omnifarious sorry I only skimmed it, and your second paragraphs first scentence, jumped out, and it could easily be missread. I deleted my comment – Spudd86 Feb 09 '11 at 02:54
  • @Spudd86 - Then I shall delete mine and we can forget about it. :-) – Omnifarious Feb 09 '11 at 03:34
5

The standard says nothing about atomicity.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
4

The volatile keyword is used to indicate that a variable (int, char, or otherwise) may be given a value from an external, unpredictable source. This usually deters the compiler from optimizing out the variable.

For atomic you will need to check your compiler's documentation to see if it provides any assistance or declaratives, or pragmas.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
4

On any sane cpu, reading and writing any aligned, word-size-or-smaller type is atomic. This is not the issue. The issues are:

  • Just because reads and writes are atomic, it does not follow that read/modify/write sequences are atomic. In the C language, x++ is conceptually a read/modify/write cycle. You cannot control whether the compiler generates an atomic increment, and in general, it won't.
  • Cache synchronization issues. On halfway-crap architectures (pretty much anything non-x86), the hardware is too dumb to ensure that the view of memory each cpu sees reflects the order in which writes took place. For example if cpu 0 writes to addresses A then B, it's possible that cpu 1 sees the update at address B but not the update at address A. You need special memory fence/barrier opcodes to address this issue, and the compiler will not generate them for you.

The second point only matters on SMP/multicore systems, so if you're happy restricting yourself to single-core, you can ignore it, and then plain reads and writes will be atomic in C on any sane cpu architecture. But you can't do much useful with just that. (For instance, the only way you can implement a simple lock this way involves O(n) space, where n is the number of threads.)

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • I was under the impression that cache synchronization issues are definitely an issue on x86 as well as other architectures? I vaguely remember reading something that said basically if 2 cores share cache (eg: a core 2 duo) then you'll be OK, but as soon as you get more than one physical processor, or cores without shared cache (eg: a pentium D or core 2 quad) then you need to be worrying about cache synchronization – Orion Edwards Feb 08 '11 at 19:41
  • As far as I know, every x86 ever created except some cheap Cyrix (iirc) rip-offs has strictly ordered memory writes, even between separate physical cpus. – R.. GitHub STOP HELPING ICE Feb 08 '11 at 19:42
  • 1
    The other architectures are not exactly "dumb", but rather optimized for running 100s of cores in parallel. One way to increase performance is to not waste time synchronizing caches unless explicitly asked to. The x86 is easier to work with, but doesn't scale that well for shared memory systems. – Bo Persson Feb 08 '11 at 23:47
  • ARM is not sane then? volitile isn't enough even on x86 – Spudd86 Feb 09 '11 at 02:33
  • 3
    `volatile` is sufficient on x86 if all you need is ordering. As I said in my answer, it will never help you if you need atomic read/modify/write. For the latter you need asm or C1x atomics. – R.. GitHub STOP HELPING ICE Feb 09 '11 at 02:36
  • I'll bite. Your reasoning about x86 agrees with my own. – Matt Joiner Feb 09 '11 at 05:39
  • Perhaps I could have worded my descriptions of non-x86 less-offensively. :-) – R.. GitHub STOP HELPING ICE Feb 09 '11 at 06:34
3

Short answer : Don't use volatile to guarantee atomicity.

Long answer One might think that since CPUs handle words in a single instruction, simple word operations are inherently thread safe. The idea of using volatile is to then ensure that the compiler makes no assumptions about the value contained in the shared variable.

On modern multi-processor machines, this assumption is wrong. Given that different processor cores will normally have their own cache, circumstances might arise where reads and writes to main memory are reordered and your code ends up not behaving as expected.

For this reason always use locks such as mutexes or critical sections when access memory shared between threads. They are surprisingly cheap when there is no contention (normally have no need to make a system call) and they will do the right thing.

Typically they will prevent out of order reads and writes by calling a Data Memory Barrier (DMB on ARM) instruction which guarantee that the reads and writes happen in the right order. Look here for more detail.

The other problem with volatile is that it will prevent the compiler from making optimizations even when perfectly ok to do so.

doron
  • 27,972
  • 12
  • 65
  • 103