9

Condition variables should have have a single order with respect to notify() and unlock_sleep() (an imaginary function call used within wait() where the mutex is unlocked and the thread sleeps as one atomic sequence of operations) operations. To achieve this with arbitrary lockables std::condition_variable_any implementations typically use another mutex internally (to ensure atomicity and to sleep on)

If the internal unlock_sleep() and notify() (notify_one() or notify_all()) operations are not atomic with respect to each other you risk a thread unlocking the mutex, another thread signaling and then the original thread going to sleep and never waking up.

I was reading the libstdc++ and libc++ implementations of std::condition_variable_any and noticed this code in the libc++ implementation

{lock_guard<mutex> __lx(*__mut_);}
__cv_.notify_one();

the internal mutex is locked and then immediately unlocked before the signal operation. Doesn't this risk the problem I described above?

libstdc++ seems to have gotten this right

Curious
  • 20,870
  • 8
  • 61
  • 146
  • I'm unfamiliar with `unlock_sleep()`. Can you provide a link to docs? Also, by `notify()` do you mean either of `notify_one()` or `notify_all()`? – Michael Burr Oct 29 '17 at 18:46
  • 1
    @MichaelBurr sorry I meant the general operation of going to sleep and unlocking the mutex as one atomic operation. I'll update the question – Curious Oct 29 '17 at 18:47

1 Answers1

1

The C++11 and later standards explicitly say "The execution of notify_one and notify_all shall be atomic". So in one sense I think that you are correct that the internal mutex should be held across the call down to the platform's underlying condition variable notify call (for example pthread_cond_signal())

However, I don't think that the libc++ implementation will cause notifications to be missed because without the notifying thread synchronizing on the lock the waiting thread passes to wait() (or some other synchronization between the two threads) while calling notify_one() (or notify_all()) there is no way to ensure which of the two threads is 'first' to the notify or wait anyway. So if the notification can be missed in libc++'s current implementation, it could also be missed if libc++ were changed to hold the internal lock across its call down to the platform's notify API.

So I think that libc++ can invoke the "as if" rule to say that the implementation of notify_one()/notify_any() is "atomic enough".

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • > "if the notification can be missed in libc++'s current implementation, it could also be missed if libc++ were changed to hold the internal lock across its call down to the platform's notify API" I don't think so, if the internal lock was held. Then the whole sequence of going to sleep and unlocking the internal mutex would be atomic. If the internal lock was held then there would be no way for a notify to creep in right in the middle of unlocking the internal mutex and going to sleep (which is the problem I am asking about) – Curious Nov 02 '17 at 15:15
  • How are these two scenarios different: **scenario 1)** thread W's `wait()` acquires the internal mutex then thread N's `notify_one()` occurs (because it has already released the internal mutex) before thread W actually calls down to `pthread_cond_wait()`, and **scenario 2)** thread N's `notify_one()` call executes entirely just before thread W `wait()` call acquires the internal mutex? I believe that without some external synchronization the user of the API can't distinguish between those two situations. – Michael Burr Nov 02 '17 at 15:53
  • Why acquire the mutex in the first place in `notify_one()` then? – Curious Nov 02 '17 at 16:09
  • http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2406.html#gen_cond_var this might explain it better than I can – Curious Nov 02 '17 at 16:14
  • I think that you're asking about the problem of missing a notification when the predicate has been updated (please correct me if I'm wrong). ThreadW needs to hold a lock while checking the predicate and continue holding that lock when calling `wait()` (if it decides it need to wait). ThreadN needs to hold that same lock when updating the predicate (though it doesn't need to hold the lock when calling `notify_xxx()`). But the internal mutex in `condition_variable_any` is *not* the lock that protects the predicate. So the internal mutex isn't what protects from missing a notify. – Michael Burr Nov 02 '17 at 17:55
  • My understanding is that the internal mutex is just something that `condition_variable_any` can use with the underlying platform API (since that API needs an actual mutex, while the thing that can be passed to `condition_variable_any::wait()` doesn't need to be a mutex - it just needs to be something that's "lockable". `condition_variable_any::wait()` uses the internal mutex as a surrogate when it calls the platform API for the lockable object that gets passed in. – Michael Burr Nov 02 '17 at 17:56
  • And I can't explain why the `notify_xxx()` call acquires/releases the internal mutex. Clearly I don't have a complete understanding of what's going on - which is not unusual when dealing with low level synchronization. – Michael Burr Nov 02 '17 at 17:56
  • I think I understand why the `notify_xxx()` functions acquire/release the mutex, and it is critical to preventing the missed notification scenario. However, it does still require that the the user of `condition_variable_any` properly use an external lock to protect the check and update of the predicate. I don't have time to write up my understanding right now - it'll have to wait until after work (hopefully it won't have evaporated from my brain by then...) – Michael Burr Nov 02 '17 at 18:41