C++ atomics: how to allow only a single thread to access a function?

Question

I'd like to write a function that is accessible only by a single thread at a time. I don't need busy waits, a brutal 'rejection' is enough if another thread is already running it. This is what I have come up with so far:

std::atomic<bool> busy (false);

bool func()
{
    if (m_busy.exchange(true) == true)
        return false;  

    // ... do stuff ...

    m_busy.exchange(false);
    return true;
}

Is the logic for the atomic exchange correct?
Is it correct to mark the two atomic operations as std::memory_order_acq_rel? As far as I understand a relaxed ordering (std::memory_order_relaxed) wouldn't be enough to prevent reordering.

Do you *ever* need a thread to wait? Or will *every* thread that ever accesses this have something else to do if it can't get the lock? — David Schwartz, Nov 17 '20 at 23:09
@DavidSchwartz b) threads that can't get the lock have something else to do. — Ignorant, Nov 17 '20 at 23:14
FYI: The thing that you are trying to implement has a name. It is called a [_mutex_](https://en.wikipedia.org/wiki/Lock_(computer_science)). Also, sometimes known as an _advisory lock,_ or simply a _lock,_ and the code lines that are protected by a mutex are sometimes known as a [_critical section_](https://en.wikipedia.org/wiki/Critical_section). "Mutex" is a portmanteau of "mutual exclusion." — Solomon Slow, Nov 18 '20 at 02:34
Also note: The idea that you are trying to prevent multiple threads from entering the same function at the same time is a distraction. It seldom really matters how many threads enter the same function at the same time. What _really_ matters is, how many threads access the same _data_ at the same time. It's easy to mix up the two ideas when there's only one function that accesses the data. You can get into trouble when there's more than one function that accesses the data unless _all_ of the functions lock the same mutex. — Solomon Slow, Nov 18 '20 at 02:38

score 10 · Answer 1 · edited Nov 18 '20 at 01:11

10

Your atomic swap implementation might work. But trying to do thread safe programming without a lock is most always fraught with issues and is often harder to maintain.

Unless there's a performance improvement that's needed, then std::mutex with the try_lock() method is all you need, eg:

std::mutex mtx;

bool func()
{
    // making use of std::unique_lock so if the code throws an
    // exception, the std::mutex will still get unlocked correctly...

    std::unique_lock<std::mutex> lck(mtx, std::try_to_lock);
    bool gotLock = lck.owns_lock();

    if (gotLock)
    {
        // do stuff
    }

    return gotLock;
}

edited Nov 18 '20 at 01:11

Remy Lebeau

555,201
31
458
770

answered Nov 17 '20 at 22:07

selbie

100,020
15
103
173

I'd love to use a try-lock, unfortunately I am in a 100% lock-free scenario :) – Ignorant Nov 17 '20 at 22:15
Can you elaborate why you can't use locks? – selbie Nov 17 '20 at 22:18
3

@Ignorant: Your code is logically equivalent to this, and very likely performance-equivalent, too. If you never use `mtx.lock()` or whatever, you're never actually blocking i.e. waiting for the lock, so your code is still potentially [lock-free](https://en.wikipedia.org/wiki/Non-blocking_algorithm) in the progress-guarantee sense. "lock-free" does *not* mean "avoiding anything with 'lock' in its name". A hand-rolled spinlock using std::atomic would not be lock-free, while using only the non-blocking functionality of std::mutex *is* still lock-free (and potentially wait-free). – Peter Cordes Nov 17 '20 at 22:20
Some std::mutex implementations may do slightly more work that a simple atomic flag in the uncontended and failed-trylock cases, but any good implementation should avoid making any system calls if used this way. (@selbie: the mutex should probably be a `static` local var, not global.) – Peter Cordes Nov 17 '20 at 22:22
@selbie: my function is part of a more complex data structure that is handled by a high priority thread (audio programming). I'm mostly concerned about priority inversion problems. – Ignorant Nov 17 '20 at 22:23
@PeterCordes - if the mutex was static local inside the function, what happens when both threads enter the function at the same time for the first time? Will the construction of the static mutex be appropriately serialized? How does the compiler guarantee a static local var is only initialized once with respect to threads? – selbie Nov 17 '20 at 22:25
@selbie: ISO C++ guarantees that static local vars are constructed exactly once, if they need runtime construction at all. Implementations typically protect it with something like double-checked locking to make the already-initted fast path a pure read-only check. But yeah, if `std::mutex` can't just be statically initialized in the `.data` section, making it global would help performance. If you hand-roll a boolean flag, there'd be no downside: it can definitely be statically initialized without needing a runtime constructor. – Peter Cordes Nov 17 '20 at 22:30
Turns out pthreads / libstdc++'s `std::mutex` (on GNU/Linux) doesn't need runtime construction; an all-zero static initializer is fine so it can be in the .bss section. https://godbolt.org/z/163dEb shows the lack of a guard variable when it's inside the function. (And by contrast, how GCC implements `static unsigned count = x;` in another function, which becomes much simpler with `=0;`) – Peter Cordes Nov 17 '20 at 22:35
@Ignorant If the high priority thread only calls `tryLock`, what's the issue? – David Schwartz Nov 17 '20 at 23:05
1

@selbie Note that if your code example throws an exception in the `do stuff` section, the mutex won't be unlocked properly. Consider using a `std::unique_lock` with the `std::try_to_lock` policy so the mutex is always unlocked if it is successfully locked, even if an exception is thrown. `bool func() { std::unique_lock lck(mtx, std::try_to_lock); bool gotLock = lck.owns_lock(); if (gotLock) { ... } return gotLock; }` – Remy Lebeau Nov 18 '20 at 00:15

Peter Cordes · Accepted Answer · 2020-11-18T03:36:38.793

Your code looks correct to me, as long as you leave the critical section by falling out, not returning or throwing an exception.

You can unlock with a release store; an RMW (like exchange) is unnecessary. The initial exchange only needs acquire. (But does need to be an atomic RMW like exchange or compare_exchange_strong)

Note that ISO C++ says that taking a std::mutex is an "acquire" operation, and releasing is is a "release" operation, because that's the minimum necessary for keeping the critical section contained between the taking and the releasing.

Your algo is exactly like a spinlock, but without retry if the lock's already taken. (i.e. just a try_lock). All the reasoning about necessary memory-order for locking applies here, too. What you've implemented is logically equivalent to the try_lock / unlock in @selbie's answer, and very likely performance-equivalent, too. If you never use mtx.lock() or whatever, you're never actually blocking i.e. waiting for another thread to do something, so your code is still potentially lock-free in the progress-guarantee sense.

Rolling your own with an atomic<bool> is probably good; using std::mutex here gains you nothing; you want it to be doing only this for try-lock and unlock. That's certainly possible (with some extra function-call overhead), but some implementations might do something more. You're not using any of the functionality beyond that. The one nice thing std::mutex gives you is the comfort of knowing that it safely and correctly implements try_lock and unlock. But if you understand locking and acquire / release, it's easy to get that right yourself.

The usual performance reason to not roll your own locking is that mutex will be tuned for the OS and typical hardware, with stuff like exponential backoff, x86 pause instructions while spinning a few times, then fallback to a system call. And efficient wakeup via system calls like Linux futex. All of this is only beneficial to the blocking behaviour. .try_lock leaves that all unused, and if you never have any thread sleeping then unlock never has any other threads to notify.

There is one advantage to using std::mutex: you can use RAII without having to roll your own wrapper class. std::unique_lock with the std::try_to_lock policy will do this. This will make your function exception-safe, making sure to always unlock before exiting, if it got the lock.

So if I understand correctly, the fact that only one thread can enter the critical section makes the algorithm lock-free, even if using a mutex. The lock-freedom is lost in case more than one thread is allowed (the mutex would do its job of triggering sleep syscalls and such). — Ignorant, Nov 17 '20 at 22:34
@Ignorant: lock-freedom is a progress guarantee, see the wiki link in my answer. You're already stretching it here because no other threads can make progress on whatever's in the critical section if another thread enters it and then happens to sleep or die. Doing something *else* between retries doesn't make the protected work itself lock-free. But if it's not that simple and the caller can actually still contribute progress to the same overall task when the function returns false, then it's arguably part of a lock-free algorithm. — Peter Cordes, Nov 17 '20 at 22:43
@Ignorant: To be lock-free, you definitely have to avoid sleeping to wait for some other thread to do something (that's not a sufficient condition, e.g. a spinlock isn't lock-free). But I don't see why allowing multiple threads into the critical section would make it non-lock-free. — Peter Cordes, Nov 17 '20 at 22:48
I'm pretty sure a lock-free counting semaphore is possible, that fails instead of sleeps if the limit is exceeded. e.g. `atomic slots_left=3` and acquire with `compare_exchange_weak` to decrement if the current value is non-zero. You'd need a CAS retry loop, which is lock-free but not wait-free, because you don't want to bail out on spurious failure from contention when another thread changes the count from 2 to 1 for example: there's still a slot left for this thread to take. — Peter Cordes, Nov 17 '20 at 22:48

C++ atomics: how to allow only a single thread to access a function?

2 Answers2