5

Suppose in a C program I have P threads running on a 32 bit machine, and int MAX--a shared 32-bit integer

Each thread can read/write to MAX.

Requirement: the value a thread read should not be corrupted, e.g first 16bit and last 16bit are out of sync

Question: Do I need a lock to protect the read and write? or Can I safely ignore the lock, because LOAD/SAVE assembly instruction is guaranteed to happen atomically?

Wei Shi
  • 4,945
  • 8
  • 49
  • 73
  • on x86 you probably want CMPCXHG or some interlock. Writing/reading of aligned data will be atomic but usually that (write/read), alone, is not enough. You don't know what value override. – bestsss Mar 24 '11 at 09:22

5 Answers5

4

Reads and writes are atomic when the int is aligned properly. It cannot straddle the end of a cache line. A cache line is 64 bytes. Most any compiler ensures the alignment is taken care of but it can be overridden with, say, a structure packing pragma.

Yes, you need a lock to protect the value when threads perform a read-modify-write operation. You can get a cheap one from InterlockedXxxx, perhaps.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Even reads and writes to unaligned values within a cache line aren't necessarily atomic—some CPUs might break down misaligned memory accesses into multiple micro-ops without locking the memory bus. – Adam Rosenfield Mar 24 '11 at 03:27
  • 1
    @Adam - an x86 core was specified. – Hans Passant Mar 24 '11 at 03:29
  • @Hans: a 32-bit CPU was specified, but it could be ARM or MIPS or whatever. For older 32-bit 80x86 CPUs there is no such guarantee (the "reads/writes that are contained within a single cache line" guarantee is P6 or later if I remember right). – Brendan Mar 24 '11 at 03:34
  • @Hans: I didn't say "MIPS x86" and have no idea what made you think I had. – Brendan Mar 24 '11 at 03:46
  • Read the title of the question. – Hans Passant Mar 24 '11 at 03:48
  • @hans: Doh - sorry. I saw the "LOAD/SAVE assembly instruction" part and assumed it was not x86 (which has no such instruction). – Brendan Mar 24 '11 at 03:50
  • 1
    @Hans, *A cache line is 16 bytes* it's 64 byte on x86 – bestsss Mar 24 '11 at 09:20
4

A problem here is read-modify-write.

Say you have two seperate physical cores (they can be in the same physical package). They both read. The first core modifies - which is to say, increments the value, which is currently held in a register. At this point, the modification begins to propagate out to the caches - but the second core in the meantime has also modified the value and also began to propagate the cache out to cache.

You lose one of the updates.

Cache coherency protocols do not handle the case of multiple concurrent writes. There is nothing which causes one core to wait on its write because another core is also -currently writing-; because that information is simply not publically available between the cores. They -cannot- do it.

They do handle multiple consecutive writes, e.g. after the changes have been seen on a cores external bus pins (e.g. become public knowledge, rather than being internal to the core).

Another problem is instruction re-ordering. These threads - if they are running on different cores, their instruction re-ordering will not pay attention to what the other threads are doing; only to what that thread in particular is doing.

Imagine one thread is going to write the value and then set a flag. Another thread will see the flag raised and then read the value. Those threads, if on seperate cores, will re-order their instruction flow only with regard to themselves - the cores will not consider the other threads, because they cannot - they have no knowledge of them.

So in the first thread the flag setting may be re-ordered to occur before the writing of the value - after all, for just that thread, that re-ordering is fine. Those two variables are entirely disconnected. There is no ordering dependency. The dependency exists in another thread which is on a different core and so our core can't know about it.

The other thread of course will see the flag raised and read, even though the write actually hasn't happened yet.

3

No, word sized loads and stores are atomic. However it you want to perform an addition or other arithmetic a lock would be required as they may use a read and a write.

Steve-o
  • 12,678
  • 2
  • 41
  • 60
  • 3
    Only if they're aligned properly on a 4-byte boundary—a misaligned load/store is *not* atomic. A global variable *should* be aligned properly by any sane compiler, but you could end up with an unaligned variable in a tightly-packed struct, for example. – Adam Rosenfield Mar 24 '11 at 03:25
3

Different architectures/CPUs have different atomicity guarantees. C is intended to be portable, therefore you shouldn't make any assumptions about which atomicity guarantees are made by any specific architectures/CPUs. This means that even for atomic reads/writes you should use some sort of abstraction (e.g. a library that provides the necessary atomic operations for each different architecture).

As far as I know (which isn't very much - most of my experience is with 80x86 only), for most architectures, reads and writes to aligned addresses that are less than some minimum size are usually guaranteed to be atomic (where "some minimum size" may be the size of a general purpose register, the size of a cache line, or something else).

This DOES NOT include modifications (e.g. instructions/operations that read, modify, then write to an address). For an "int MAX" variable (as opposed to something like "const int MAX = 1234"), I'd assume you'd want to do something like "if(foo > MAX) MAX = foo;" and you'd need a more robust atomic operation (e.g. maybe an atomic "compare and swap" in a loop that retries if the comparison was false).

Also, don't forget about declaring your variable as "volatile".

  • Brendan
Brendan
  • 35,656
  • 2
  • 39
  • 66
3

Most (all?) modern CPU have special instructions that guarantee atomic access to such type of variables. C has now a standardized interface to these instructions. Unfortunately this is not completely implemented by most compilers, yet.

The time being you should use extensions. gcc e.g has a whole bunch of them. If you look arround on the web you also should easily find implementations with inline assembly instructions that you could use with other compilers.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177