How does "lock cmpxchg" work in assembly?

Question

I came across this old (GCC prior to 4.8.3 -- bug 60272) bug report https://gcc.gnu.org/ml/gcc-bugs/2014-02/msg01951.html . This is fixed now. But I have a question regarding this. I compiled the following code snippet

#include <atomic>
struct Node { Node* next; };
void Push(std::atomic<Node*>& head, Node* node)
{
    node->next = head.load();
    while(!head.compare_exchange_weak(node->next, node))
        ;
}

void Pop(std::atomic<Node*>& head){
    for(;;){
        Node* value=head.exchange(nullptr);
        if(value){
            delete value;
            break;
        }
    }
}

with :

g++ -S -std=c++11 -pthread -O3 test.cc -o test.S

The assembly generated has the following ( I put up only the relevant portion):

.....
  4 .L4:
  5   lock cmpxchgq %rsi, (%rdi)
  6   jne .L6
  7   rep ret
  8   .p2align 4,,10
  9   .p2align 3
 10 .L6:
 11   movq  %rax, (%rsi)
 12   jmp .L4
.....

Here's my question. Let's say this code is running concurrently with 2 threads. for T1, line no 5 gets executed then T1 got interrupted and T2 does something which can potentially pop the queue to completion. When OS re-schedules T1, it'll resume from line 6 , where somebody should ** re-evalute** the condition before executing jne. But if it's not re-evaluated, then that can lead to memory corruption. Am I thinking in the right direction?

The compare and exchange operation is designed to always keep the queue untainted. If it succeeds, the queue is **correctly** updated, if it doesn't, a new correct queue can be built on the next iteration. At line 6 the queue is already correctly updated, the test is to keep looping if the CAS failed. — Margaret Bloom, Nov 24 '19 at 19:49
If you're wondering how `lock cmpxchg` achieves atomicity, see [Can num++ be atomic for 'int num'?](//stackoverflow.com/q/39393850) for more about how the `lock` prefix works. — Peter Cordes, Nov 24 '19 at 21:11

Pickle Rick · Answer 1 · 2019-11-24T22:15:10.403

0

The cmpxchg instruction will set dst only if it matches eax. Otherwise it will jump to L6 which updates eax and restarts the loop. The lock prefix enables exclusive access to any memory operands of the current instruction. In other words this atomically pushes the node until it's successful. The bug was because they were not checking the results initially.

edited Nov 24 '19 at 22:15

answered Nov 24 '19 at 20:59

Pickle Rick

808
3
6

It's not a spinlock; it's lock-free. (It may have to retry so it's not wait-free) – Peter Cordes Nov 24 '19 at 21:08
It sits in a busy wait loop until an atomic operation is successful, would that not qualify as a spinlock? – Pickle Rick Nov 24 '19 at 21:18
1

No, because it's not *waiting* for another thread (which might have gone to sleep with a lock held if you had a real lock). At *any* point N threads can attempt a CAS and one will succeed. Thus there's a guarantee of forward progress. https://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom. CAS retry loops are the most common example of lock-free but not wait-free algorithm building blocks. – Peter Cordes Nov 24 '19 at 21:25
Yes, but this isn't a busy *wait*. It's not *waiting* for some condition before it can succeed, it's just trying an operation that can succeed at any time, only failing if it actually races with another thread. – Peter Cordes Nov 24 '19 at 21:44
The instruction at `.L6` doesn't update EAX, the instruction at `.L4` updates EAX. – Ross Ridge Nov 25 '19 at 03:05
@PeterCordes Thanks, reading through what you shared (and some others) cleared things up. – Pickle Rick Nov 25 '19 at 19:57
@RossRidge How do you figure? The instruction at ``L4`` is ``cmpxchg`` which only compares the accumulator in ``EAX``. The instruction at ``L6`` actually updates it. – Pickle Rick Nov 25 '19 at 19:59
If you read the documentation for the CMPXCHG instruction you'll see that it will set EAX to the value the was stored in the destination operand if different. Since this is AT&T syntax assembly code, the destination operand is the second operand for both instructions.. This means the instruction `movq %rax, (%rsi)` stores the updated EAX value in the in the location pointed to by RSI. Understanding all of this, and why it's all necessary, is probably required to answer the original poster's question, but I'm not entirely sure what that is. – Ross Ridge Nov 25 '19 at 20:13
Yea, I'm aware of that, given I linked documentation to the instruction in my answer don't you think I've read it? In my answer I say that it's *updated before restarting the loop*, which it is, my comment was less clear. You're right that ``EAX`` is modified by the ``cmpxchg`` instruction, but in the context of the assembly it's irrelevant. The instruction in this case is only used to to check ``ZF``, with ``L6`` updating it on failure. Technically speaking either ``L4`` or ``L6`` will update ``EAX``, my comment should probably have been more clear but my answer remains correct. – Pickle Rick Nov 25 '19 at 21:33
Also, I think his question was about what caused the GCC bug he linked? In the original assembly generated the loop had ``mov rax``and ``jne`` immediately after the ``cmpxchg``. This caused a race check where if one thread is suspended after the exchange and another thread successfully exchanges that value, the first thread will set the accumulator to invalid memory when it's resumed. The assembly posted by OP is after the fix and checks the result of the exchange before updating the accumulator. – Pickle Rick Nov 25 '19 at 22:31
@PickleRick thanks for your comment. My question was indeed somewhat about the fix of the bug. This was mainly how in assembly things work. I posted my application-level question ( which led to this one) here : https://stackoverflow.com/questions/59041793/how-to-implement-front-method-in-a-lock-free-concurrent-queue – Writo Nov 26 '19 at 00:01

How does "lock cmpxchg" work in assembly?

1 Answers1