14

boost::shared_mutex or std::shared_mutex (C++17) can be used for single writer, multiple reader access. As an educational exercise, I put together a simple implementation that uses spinlocking and has other limitations (eg. fairness policy), but is obviously not intended to be used in real applications.

The idea is that the mutex keeps a reference count that is zero if no thread holds the lock. If > 0, the value represents the number of readers that have access. If -1, a single writer has access.

Is this a correct implementation (in particular with the used, minimal, memory orderings) that is free of data races ?

#include <atomic>

class my_shared_mutex {
    std::atomic<int> refcount{0};
public:

    void lock() // write lock
    {
        int val;
        do {
            val = 0; // Can only take a write lock when refcount == 0

        } while (!refcount.compare_exchange_weak(val, -1, std::memory_order_acquire));
        // can memory_order_relaxed be used if only a single thread takes write locks ?
    }

    void unlock() // write unlock
    {
        refcount.store(0, std::memory_order_release);
    }

    void lock_shared() // read lock
    {
        int val;
        do {
            do {
                val = refcount.load(std::memory_order_relaxed);

            } while (val == -1); // spinning until the write lock is released

        } while (!refcount.compare_exchange_weak(val, val+1, std::memory_order_acquire));
    }

    void unlock_shared() // read unlock
    {
        // This must be a release operation (see answer)
        refcount.fetch_sub(1, std::memory_order_relaxed);
    }
};
Sebastian D'Agostino
  • 1,575
  • 2
  • 27
  • 44
LWimsey
  • 6,189
  • 2
  • 25
  • 53

1 Answers1

7

(CAS = Compare And Swap = C++ compare_exchange_weak function, which on x86 will typically compile to an lock cmpxchg instruction which can only run when it owns the cache line in Exclusive or Modified MESI state).


lock_shared looks good: spinning read-only attempting a CAS only when it looks possible is better for performance than spinning on CAS or atomic increment. You already needed to do a read-only check to avoid changing -1 to 0 and unlocking a write-lock.

On x86, put a _mm_pause() in the retry path of the spin loop to avoid memory-order mis-speculation pipeline nukes when exiting the read-only spin loop, and steal fewer resources from the other hyperthread while spinning. (Use a while() loop, not do{}while(), so the pause only runs after failing once. pause on Skylake and later waits about 100 cycles so avoid it in the fast-path.)


I think unlock_shared should be using mo_release, not mo_relaxed, since it needs to order the loads from the shared data structure to make sure a writer doesn't start writing before the loads from the reader critical section happen. (LoadStore reordering is a thing on weakly-ordered architectures, even though x86 only does StoreLoad reordering.) A Release operation will order preceding loads and keep them inside the critical section.


(in write lock): // can memory_order_relaxed be used if only a single thread takes write locks ?

No, you still need to keep the writes inside the critical section, so the CAS still needs to Synchronize With (in C++ terminology) release-stores from unlock_shared.

https://preshing.com/20120913/acquire-and-release-semantics/ has a nice image that shows the 1-way barrier effect of a release-store or acquire-load.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I wasn't sure about the memory order in unlock_shared, but my reasoning was that it does not really 'release' anything since it has read-only access and cannot change the data it protects – LWimsey Dec 17 '16 at 05:15
  • @LWimsey: yeah, it's harder to think about load ordering than store ordering, but it's a real thing. A load becomes globally visible at the moment when it reads data from L1 cache. (Because that's when it makes a copy from globally-coherent cache into the out-of-order core of a single CPU.) – Peter Cordes Dec 17 '16 at 05:29
  • @LWimsey Are you saying that lock_shared could immediately do an unlock operation? Crazy, but you seem to imply that ordering WRT to unlock_shared doesn't matter... – curiousguy Aug 11 '19 at 22:08
  • 2
    @curiousguy `lock_shared` cannot unlock the shared mutex; It can only take a read-lock if already in unlocked state. You are right about the ordering in `unlock_shared`, that definitively has to be a release operation. – LWimsey Aug 12 '19 at 05:50
  • "_when the value is_" is what? – curiousguy Aug 14 '19 at 19:25