Your code is completely defunct!
The reason is that access to the variable blocked
is not atomic. Two threads can read it simultaneously and decide that the mutex is unlocked, if the two reads happen before the first thread writes out the true
update and the update propagates to all CPUs.
You need atomic variables and atomic exchange to solve this. The atomic_flag
type is exactly what you want:
#include <atomic>
std::atomic_flag blocked;
while (blocked.test_and_set()) { } // spin while "true"
// critical work goes here
blocked.clear(); // unlock
(Alternatively, you could use a std::atomic<bool>
and exchange(true)
, but the atomic_flag
is made especially for this purpose.)
Atomic variables do not only prevent the compiler from reordering code that appears to be unrelated if this were a single-threaded context, but they also make the compiler generate the necessary code to prevent the CPU itself from reordering instructions in a way that would allow inconsistent execution flow.
In fact, if you want to be slightly more efficient, you can demand a less expensive memory ordering on the set and clear operations, like so:
while (blocked.test_and_set(std::memory_order_acquire)) { } // lock
// ...
blocked.clear(std::memory_order_release); // unlock
The reason is that you only care about the correct ordering in one direction: a delayed update in the other direction is not very expensive, but requiring sequential consistency (as is the default) may well be expensive.
Important: The above code is a so-called spin lock, because while the state is locked, we do a busy spin (the while
loop). This is very bad in almost all situations. A kernel-provided mutex system call is a whole different kettle of fish, since it allows the thread to signal the kernel that it can go to sleep and let the kernel deschedule the entire thread. That is almost always the better behaviour.