How is spin lock implemented under the hood?

Question

This is a lock that can be held by only one thread of execution at a time. An attempt to acquire the lock by another thread of execution makes the latter loop until the lock is released.

How does it handle the case when two threads try to acquire the lock exactly the same time?

I think this question also applies to various of other mutex implementation.

score 7 · Answer 1 · answered May 24 '10 at 23:01

As the previous poster indicates, every modern machine type has a special class of instruction known as 'atomics' that do operate as the previous poster indicates... they serialize execution against at least the specified memory location.

On x86, there is a LOCK assembler prefix that indicates to the machine that the next instruction should be handled atomically. When the instruction is encountered, several things effectively happen on x86.

Pending read prefetches are canceled (this means that the CPU won't present data to the program that may be made stale across the atomic).
Pending writes to memory are flushed.
The operation is performed, guaranteed atomically and serialized against other CPUs. In this context, 'serialized' means 'they happen one-at-a-time'. Atomically means "all the parts of this instruction happen without anything else intervening".

For x86, there are two commonly used instructions that are used to implement locks.

CMPXCHG. Conditional exchange. Pseudocode:

uint32 cmpxchg(uint32 *memory_location, uint32 old_value, uint32 new_value) {
    atomically {
        if (*memory_location == old_value) 
            *memory_location = new_value;
        return old_value;
    }
}

XCHG. Pseudocode:

uint32 xchg(uint32 *memory_location, uint32 new_value) {
    atomically {
        uint32 old_value = *memory_location;
        *memory_location = new_value;
        return *old_value;
    }
}

So, you can implement a lock like this:

uint32 mylock = 0;
while (cmpxchg(&mylock, 0, 1) != 0)
    ;

We spin, waiting for the lock, hence, spinlock.

Now, unlocked instructions don't exhibit these nice behaviors. Depending on what machine you're on, with unlocked instructions, all sorts of violations of consistency can be observed. For example, even on x86, which has a very friendly memory consistency model, the following could be observed:

    Thread 1      Thread 2
    mov [w], 0    mov [x], 0
    mov [w], 1    mov [x], 2
    mov eax, w    mov eax, x
    mov [y], eax  mov [z], eax

At the end of this program, y and z can both have the value 0!.

Anyway, one last note: LOCK on x86 can be applied to ADD, OR, and AND, in order to get consistent and atomic read-modify-write semantics for the instruction. This is important for, say, setting flag variables and making sure they don't get lost. Without that, you have this problem:

   Thread 1       Thread 2
   AND [x], 0x1   AND [x], 0x2

At the end of this program, possible values for x are 1, 2, and 0x1|0x2 (3). In order to get a correct program, you need:

   Thread 1           Thread 2
   LOCK AND [x], 0x1  LOCK AND [x], 0x2

Hope this helps.

I think in your definition of cmpxchg(), you would want to return the actual value of the memory location. From https://en.wikipedia.org/wiki/Compare-and-swap, "The result of the operation must indicate whether it performed the substitution; this can be done ... by returning the value read from the memory location (not the value written to it)." — user674669, Nov 04 '12 at 20:05

score 2 · Answer 2 · answered May 23 '10 at 13:50

2

Depends on the processor and the threading implementation. Most processors have instructions that can be executed atomically, on top of which you can build things like spin locks. For example IA-32 has an xchg instruction that does an atomic swap. You can then implement a naive spinlock like:

  eax = 1;
  while( xchg(eax, lock_address) != 0 );
  // now I have the lock
  ... code ...
  *lock_address = 0; // release the lock

answered May 23 '10 at 13:50

Logan Capaldo

39,555
5
63
78

Again, it depends on the processor architecture. Implementation of `xchg` is a microcode/hardware problem. Ultimately, nothing actually happens at exactly the same time, even with multiple processors/cores they share a bus and a clock. It should be possible to find the control path for one of these instructions for a specific processor (SPARC might be the easiest), but there is unlikely to be a specific answer that applies in general. – Logan Capaldo May 23 '10 at 13:59
So lock is ultimately implemented based on the fact that nothing actually happens at exactly the same time ? – httpinterpret May 23 '10 at 14:11
1

Saying "nothing actually happens at exactly the same time" is in some sense an overstatement on my part. Rather, there are some things that can't happen at the same time as themselves. One example might be, say you have a machine with 32 bits of memory. Two processors can't simultaneously flip bit 0, there's only one logical line leading into that bit (flip-flop or whatever). On top of these properties things like `xchg` get built, and on top of things like `xchg`, spinlocks, mutexes etc. get built. – Logan Capaldo May 23 '10 at 14:30
@httpinterpret (via fante) Just to make things clear, a 'lock' needs to be defined in 'shared' memory. In multiple processor scenarios the hardware forces 'serialized' access to this 'shared' memory (lock?) by each of these cores. Therefore, as you can see there will be never be 2 or more concurrent executions of an 'xchg' instruction on 'shared' memory. So, the 'naive' spinlock proposed by Logan Capaldo (as shown above) should work flawless. Anyway, I am not an expert. Any advices will be appreciated. – drs Jul 04 '14 at 20:12

How is spin lock implemented under the hood?

2 Answers2