Here is the Linux implementation of a spinlock from arch/arm/include/asm/spinlock.h
:
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
unsigned long tmp;
u32 newval;
arch_spinlock_t lockval;
prefetchw(&lock->slock);
__asm__ __volatile__(
"1: ldrex %0, [%3]\n"
" add %1, %0, %4\n"
" strex %2, %1, [%3]\n"
" teq %2, #0\n"
" bne 1b"
: "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
: "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
: "cc");
while (lockval.tickets.next != lockval.tickets.owner) {
wfe();
lockval.tickets.owner = READ_ONCE(lock->tickets.owner);
}
smp_mb();
}
...
static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
smp_mb();
lock->tickets.owner++;
dsb_sev();
}
My concern is that the following two lines in arch_spin_lock
:
while (lockval.tickets.next != lockval.tickets.owner) {
wfe();
are not atomic. So what if arch_spin_unlock
was called in between these two lines? This means in the function arch_spin_lock
the WFE
instruction would be run but the SEV
has already been run and won't be run again. So at the very worst arch_spin_lock
would wait forever, or until some unrelated event occurs.
Is this correct, or am I misunderstanding something? If it is a problem even only in theory, is there a way to avoid the problem?