As we know, access aligned fundamental data types in INTEL X86 architecture is atomic. How about ARMV8? I have tried to get the result from Arm Architecture Reference Manual Armv8, for A-profile architecture, I did find something related to atomicity. ARMV8 is other-multi-copy atomic. It promises that multi-threads access one same LOCATION is atomic. But it says LOCATION is a byte. I am wondering that if thread 1 writes an aligned uint64_t memory without lock and thread 2 reads or writes it without lock at the same time. Is it atomic?(uint64_t is 8 bytes, but LOCATION is only one byte)
1 Answers
This is explained in B2.2 of the ARMv8 Architecture Reference Manual. In general, ordinary loads and stores of up to 64 bits, if naturally aligned, are single-copy atomic. In particular, if one thread stores to an address and another loads that same address, the load is guaranteed to see either the old or the new value, with no tearing or other undefined behavior. This is roughly analogous to a relaxed
load or store in C or C++; indeed, you can see that compilers emit ordinary load and store instructions for such atomic accesses. https://godbolt.org/z/cWjaed9rM
Let's prove this for an example. For simplicity, let's use an aligned 2-byte halfword H, calling its bytes H0 and H1. Suppose that in the distant past, H was initialized to 0x0000 by a store instruction Wi; the respective writes to bytes H0 and H1 will be denoted Wi.0 and Wi.1. Now let a new store instruction Wn = {Wn.0,Wn.1} store the value 0xFFFF, and let it race with a load instruction R = {R.0,R.1}. Each of the accesses Wi, Wn, R is single-copy atomic by B2.2.1, first two bullets. We wish to show that either R.0,R.1 both return 0x00, or else they both return 0xFF.
By B2.3.2 there is a reads-from relation pairing each read with some write. R.0 must read-from either Wi.0 or Wn.0, as those are the only two writes to H0, and thus it must return either 0x00 or 0xFF. Likewise, R.1 must also return either 0x00 or 0xFF. If they both return 0x00 we are done, so suppose that one of them, say R.1, returns 0xFF, and let us show that R.0 also returns 0xFF.
We are supposing that R.1 reads-from Wn.1. By B2.2.2 (2), none of the overlapping writes generated by Wn are coherence-after the corresponding overlapping reads generated by R, in the sense of B2.3.2. In particular, Wn.0 is not coherence-after R.0.
Note that Wn.0 is coherence-after Wi.0 (coherence order is a total order on writes, so one must come after the other, and we are assuming Wi took place very long ago, with sufficient sequencing or synchronization in between). So if R.0 reads-from Wi.0, we then have that Wn.0 is coherence-after R.0 (definition of coherence-after, second sentence). We just argued that is not the case, so R.0 does not read-from Wi.0; it must read-from Wn.0 and therefore return 0xFF. ∎
Note that on x86, ordinary loads and stores implicitly come with acquire and release ordering respectively, and this is not true on ARM64. You have to use ldar / stlr
for that.

- 48,811
- 6
- 54
- 82
-
For your reply "In particular, if one thread stores to an address and another loads that same address, the load is guaranteed to see either the old or the new value, with no tearing or other undefined behavio", I dont see any discription about this in B2.2, could you help point out? I only see something in B2.3.4 which is "The Armv8 memory model is described as being Other-multi-copy atomic", but after I read it, it is vague and confused me. I need your help! – Hankin Dec 13 '21 at 01:18
-
@Hankin: I added a proof. – Nate Eldredge Dec 13 '21 at 02:17
-
Thanks for proving. I do this research for production, so I need official detail. From my point of view, atomicity is not ordering relations. The trueth may be ralated to multi-copy atomicity and multi-thread visibility. For example, uint64_t v1=0; thread A writes v1 = 0xffffffff00000001, if thread B can see partial updated like 0xffffffff00000000 ?. But official document(B2.3.4) only mentioned that all observers can read or write one LOCATION(which is only 1 byte) coherently. "A Location is a byte that is associated with an address in the physical address space.Note," – Hankin Dec 13 '21 at 02:56
-
@Hankin: The properties of single-copy atomicity are *defined* in terms of ordering relations (B2.2.2), so you have to think about those relations if you want to draw any conclusions. It's true that a multi-byte load or store accesses many *locations*. But B2.2.2 imposes some rules as to how those loads to the different locations may be ordered with their corresponding stores, and it is those rules which give you the atomicity guarantees that you desire. – Nate Eldredge Dec 13 '21 at 03:01
-
@Hankin: In your example, no, thread B will not see 0xffffffff00000000. The proof is exactly the same as the one I wrote, just with 8 bytes instead of 2. Multi-copy atomicity is not relevant, and any way B2.2.4 says we don't have it. Other-multi-copy becomes relevant when you have three or more threads. Single-copy atomicity promises that each reader will see either the old or new 64-bit value; other-multi-copy only adds constraints on which ones can see which. – Nate Eldredge Dec 13 '21 at 03:14
-
Thank you for reply. I have got confused to single-copy atomicity and multi-copy atomicity. In my opinion, firstly, single-copy atomicity discribes how a single CPU access the memory. For instance, if data is aligned foundamental data type, indeed, CPU can access it in one single instruction, but If a data is unaligned, some CPU can't do it in one single instruction. Secondly, multi-copy atomicity discribes how multi-processor access the memory. In my original question, I think it matches multi-copy atomicity scenario. What do you think? – Hankin Dec 13 '21 at 03:15
-
@Hankin: No, the name "single-copy" does not mean that it only describes accesses by a single CPU. Single-copy atomicity definitely imposes rules on what can happen in a multi-processor system, and I stand by my assertion that it provides all the guarantees you asked about. If your system only has one processor then you really don't need atomicity at all, neither single-copy nor any other kind. – Nate Eldredge Dec 13 '21 at 03:17
-
Thank you. Maybe I misunderstand these two terms. So single-copy atomicity tells the atomicity of instruction. And multi-copy atomicity tells the coherency of instructions. Am i right? – Hankin Dec 13 '21 at 03:26
-
@Hankin: I don't fully understand why the names were chosen as they were. I don't think the distinction is as simple as you say, but I don't have a good way of summarizing their properties short of repeating the full formal definitions. – Nate Eldredge Dec 13 '21 at 03:29
-
As you just said, “Other-multi-copy becomes relevant when you have three or more threads.” You inspired me. For other-multi-copy, at least have 2 write threads and 1 read threads. It mainly discribes the coherency of writes. And it determines the value sequence which observers will see. right? – Hankin Dec 13 '21 at 03:41