I've seen this code (from arm), it is basically incrementing a variable in memory and other PEs(processing element, or thread) are doing the same thing. So it is a critical section problem multiple PEs accessing the same data on memory.
not_exec_core:
// Increment the sync variable to indicate this core is waiting
ldr x0, =core_sync
core_waiting:
ldr w1, [x0] // pull line into cache
dsb sy
isb
ldaxr w1, [x0]
add w1, w1, #1
stlxr w2, w1, [x0]
cbnz w2, core_waiting
sev
The core first loads the data in w1 register and the comment says "pull line into cache", and I can understand that. It then gives data and instruction synch command. So the cache line is filled for this PE(and for other PEs maybe doing the same). But why does it load the variable again with ldaxr instruction(which is exclusive, atomic method)? Why doesn't it use ldaxr in the first place?
It increments the value and write it back atomically and if it's not successful, it goes back to core_waiting: to try to increment the value.
I would appreciate it if anyone give me an explanation on this.