Having been reading through Understanding the Linux kernel (Bovet & Cesati),
the chapter on Kernel Synchronisation states that the spin lock acquisition code boils down to:
1: lock:
btsl $0, slp
jnc 3
2: testb $1, slp
jne 2
jmp 1
3:
Now I originally thought that it seemed wasteful to have nested loops and you could implement something like:
1: lock:
btsl $0, slp
jc 1
which would be a lot simpler. However, I see why they did it since the lock
affects the other CPUs and the timings for the btsl
are larger than those for a simple testb
.
The one thing I haven't been able to get my head around is the subsequent release of the spin lock. The book states that it yields the following:
lock:
btrl $0, slp
My question is basically why? It seems to me that a lock/mov-immediate
combo is faster.
You don't need to get the old state to the carry flag since, following the rule that the kernel is bug-free (assumed in lots of other places inside said kernel), the old state will be 1 (you wouldn't be trying to release it if you hadn't already acquired it).
And a mov
is much faster than a btrl
, at least on the 386.
So what am I missing?
Have the timings changed for those instructions on later chips?
Has the kernel been updated since the book was printed?
Is the book just plain wrong (or showing simplified instructions)?
Have I missed some other aspect involving syncronisation between CPUs that the faster instruction doesn't satisfy?