In armv8 assembly, why does it do ldr for an address, then synch barrier, and then do ldaxr for the same address again?

Question

I've seen this code (from arm), it is basically incrementing a variable in memory and other PEs(processing element, or thread) are doing the same thing. So it is a critical section problem multiple PEs accessing the same data on memory.

not_exec_core:
                // Increment the sync variable to indicate this core is waiting
                ldr     x0, =core_sync
core_waiting:
                ldr     w1, [x0]                // pull line into cache
                dsb     sy
                isb
                ldaxr   w1, [x0]
                add     w1, w1, #1
                stlxr   w2, w1, [x0]
                cbnz    w2, core_waiting
                sev

The core first loads the data in w1 register and the comment says "pull line into cache", and I can understand that. It then gives data and instruction synch command. So the cache line is filled for this PE(and for other PEs maybe doing the same). But why does it load the variable again with ldaxr instruction(which is exclusive, atomic method)? Why doesn't it use ldaxr in the first place?
It increments the value and write it back atomically and if it's not successful, it goes back to core_waiting: to try to increment the value.
I would appreciate it if anyone give me an explanation on this.

Probably because they want/need `dsb sy` to be before the `ldaxr` for this use-case, but we want the CPU to start working on the possible cache miss *before* it finishes the slow `dsb`. So it's basically a dummy `ldr` used as a prefetch. But I'm not an AArch64 expert, and don't recognize this specific code, so only commenting instead of answering. I assume it's not C compiler output; where did you see it? (Link would be a good idea for context). — Peter Cordes, Feb 26 '21 at 12:34
I think the code came with the IP we bought. An armv8 procesor start code. — Chan Kim, Feb 26 '21 at 14:05
Yeah, the prefetch idea makes sense, especially given the comment. It should be just an optimization that could be dropped without breaking the code. What's less clear is what the synchronization barriers are trying to protect. — Nate Eldredge, Feb 26 '21 at 17:50

In armv8 assembly, why does it do ldr for an address, then synch barrier, and then do ldaxr for the same address again?

0 Answers0