5

I have read in many places that there is some overhead associated with std::condition_variable_any. Just wondering, what is this overhead?

My guess here is that since this is a generic condition variable that can work with any type of lock, it requires a manually rolled implementation of waiting (perhaps with another condition_variable and mutex or futex, or something similar) so the extra overhead probably comes from that? But not sure... As opposed to just being a native wrapper around pthread_cond_wait() (and equivalent on other systems) etc.


As a followup, if I was say implementing something that waits on, say, a shared mutex, then is this type of condition variable a bad choice because of the performance overhead? What else can I do in this situation?

Curious
  • 20,870
  • 8
  • 61
  • 146
  • Why do you have to guess when the source code of at least two major implementations is freely available? – T.C. Oct 07 '17 at 08:49
  • 2
    If you need the features of `condition_variable_any`, you need it. Otherwise use the plain `condition_variable`, if that's what you need. It is unlikely that we can come up with a simple and extremely fast method that the standard library implementers haven't thought of. – Bo Persson Oct 07 '17 at 09:26
  • on vc++, there is an overhead - the cv_any allocates its state on a shared ptr, and when you wait on it, it duplicates the pointer, locks both your given lock and it's own inter-created lock. but as commented above, if you need cv_any, I guess there's no way around it. – David Haim Oct 07 '17 at 10:33
  • @DavidHaim having some trouble understanding this, why does the lock have to be wrapped in a shared_ptr? The lifetime of the lock should be valid till the lifetime of the condition variable right? Why extend its lifetime? – Curious Oct 08 '17 at 07:16
  • 1
    Because that mutex ("lock") needs to outlive the condition variable, thanks to [thread.condition.condvarany]/5. – T.C. Oct 19 '17 at 05:20
  • @T.C. Are there good usecases for that? I can imagine where that might be useful, but the specification is hidden away where I did not look (I'm guessing most people don't know about this either?) Also I feel like most programs take care to manually ensure the condition variable and mutex have the appropriate lifetimes.. – Curious Oct 19 '17 at 05:24
  • I'm talking about the `std::mutex` used internally by the `condition_variable_any`, not the ones passed by the user; the user has no direct control over the internal mutex's lifetime. As to why, presumably it does that because plain `condition_variable` does that, and plain `condition_variable` does that because pthreads condition variables do that. – T.C. Oct 19 '17 at 05:49
  • @T.C. yeah I got which mutex the standard was talking about. How does `std::condition_variable` ensure the lifetime of the associated mutex is shared across the threads that are calling wait on the condition variable? – Curious Oct 19 '17 at 05:51
  • `std::condition_variable` is just a thin wrapper around `pthread_cond_t` or equivalent. It has no internal mutex to manage. – T.C. Oct 19 '17 at 22:24
  • @T.C. I may have miscommunicated but I meant to say this - how do pthread condition variables and mutexes do that? Their lifetimes are managed manually right? – Curious Oct 19 '17 at 22:50
  • @Curious Do what, exactly? – T.C. Oct 20 '17 at 19:36

1 Answers1

5

pthread_cond_wait() / SleepConditionVariableSRW(), same as the the plain std::condition_variable::wait() require just a single, atomic syscall for both releasing the mutex, waiting for the condition variable and re-aquiring the mutex. The thread immediately goes to sleep and another thread - ideally one which was blocked by the mutex - can take over immediately on the same core.

With std::condition_variable_any, the unlock of the passed BasicLockable and starting to wait on the native event / condition is more than just a single syscall, it's invoking the unlock() method on the BasicLockable first and only then issues the syscall for waiting. So you have at least the overhead from the separate unlock(), plus you are more likely to trigger an less than ideal scheduling decision on the OS side. Worst case, the unlock even caused continuation of a waiting thread on a different core, with all the associated overhead.

The other way around, e.g. on spurious wakes, there are also OS side scheduling optimizations possible when dealing with a native mutex (as used in std::mutex) which don't apply with a generic BasicLockable.

Both involve some book keeping, in order to provide notify_all() logic (it's actually one event / condition per waiting thread) as well as the guarantees about all methods being atomic, so they both come with a small overhead anyway.

The real overhead comes from how well the OS can make a good scheduling decision on the combined signal-and-wait-and-lock syscall. If the OS isn't smart about the scheduling, then it makes virtually no difference.

Ext3h
  • 5,713
  • 17
  • 43