1

Consider this simple synchronization problem. I have two threads, A and B, that each execute 2 steps. I want step 1a to be performed before step 2b.

Thread A Thread B
Step 1a Step 1b
Step 2a Step 2b

I have some options for how to implement this.

std::condition_variable + std::mutex + bool

This is the solution proposed by this stack overflow answer and this leetcode discussion page.

Thread B will wait on the condition variable, and Thread A will notify the condition variable. The mutex is required because it is the argument of the condition_variable's wait.

#include <iostream>
#include <thread>
#include <condition_variable>

std::condition_variable step_1a;
std::mutex a_mutex_I_guess;
bool step_1a_done = false;

void Step_1a() {
    std::cout << "step 1a" << "\n";
}
void Step_2a() {
    std::cout << "step 2a"  << "\n";
}
void Step_1b() {
    std::cout << "step 1b" << "\n";
}
void Step_2b() {
    std::cout << "step 2b" << "\n";
}


void A() {
    //std::unique_lock<std::mutex> lck{ a_mutex_I_guess }; unnecessary
    Step_1a();
    step_1a_done = true;
    //lck.unlock(); unnecessary
    step_1a.notify_one();
    Step_2a();
}

void B() {
    Step_1b(); 
    std::unique_lock<std::mutex> lck{ a_mutex_I_guess };
    step_1a.wait(lck, []() { return step_1a_done; });
    Step_2b();
}


int main() {

    std::thread thread_A{ A };
    std::thread thread_B{ B };

    thread_A.join();
    thread_B.join();

}

To me, this seems like overkill. std::condition_variables are designed to handle multiple waiting threads. std::mutex is intended to protect shared data, not to be fodder for wait. On top of all of that, I needed bool step_1a_done to actually keep track of whether or not step_1a had completed.

As a measure of their complexity, the mutex, condition_variable, and bool together require 153 (80 + 72 + 1) bytes of memory on my machine.

std::binary_semaphore

Alternatively, I can use a binary semaphore. Semantically, the binary semaphore isn't meant for one-time-use. However, it gets the job done with simpler tools than the previous option.

#include <iostream>
#include <thread>
#include <semaphore>

std::binary_semaphore step_1a_sem{ 0 };

void Step_1a() {
    std::cout << "step 1a" << "\n";
}
void Step_2a() {
    std::cout << "step 2a"  << "\n";
}
void Step_1b() {
    std::cout << "step 1b" << "\n";
}
void Step_2b() {
    std::cout << "step 2b" << "\n";
}


void A() {
    //std::unique_lock<std::mutex> lck{ a_mutex_I_guess }; unnecessary
    Step_1a();
    step_1a_sem.release();
    Step_2a();
}

void B() {
    Step_1b(); 
    step_1a_sem.acquire();
    Step_2b();
}


int main() {

    std::thread thread_A{ A };
    std::thread thread_B{ B };

    thread_A.join();
    thread_B.join();

}

step_1a_sem requires only 1 byte of memory.

Question

My assessment is that binary_semaphore is better. However, even better would be a "one_time_semaphore" that documents (or enforces) in my code that release should only be called once. Are there C++ concurrency primitives that are a better fit for this thread synchronization problem?

EDIT: std::promise<void>

@Daniel Langr has pointed out that std::promise<void> also works. While this seems like the exact use case of std::promise<void>, things appear significantly more complicated under the hood than with a binary_semaphore. The memory requirement is 24 bytes.

#include <iostream>
#include <thread>
#include <future>

std::promise<void> step_1a_done;

void Step_1a() {
    std::cout << "step 1a" << "\n";
}
void Step_2a() {
    std::cout << "step 2a"  << "\n";
}
void Step_1b() {
    std::cout << "step 1b" << "\n";
}
void Step_2b() {
    std::cout << "step 2b" << "\n";
}


void A() {
    Step_1a();
    step_1a_done.set_value();
    Step_2a();
}

void B() {
    Step_1b(); 
    step_1a_done.get_future().wait();
    Step_2b();
}


int main() {

    std::thread thread_A{ A };
    std::thread thread_B{ B };

    thread_A.join();
    thread_B.join();

}
Mark Wallace
  • 528
  • 2
  • 12
  • 3
    If you want a *truly* one time barrier, you should look at `std::latch` – Kaldrr May 10 '22 at 07:21
  • @Scheff'sCat Yes, I missed the *one-time-use* statement. – Louis Go May 10 '22 at 07:25
  • 1
    If you want to keep complexity down a `std::atomic` would get the job done. – super May 10 '22 at 07:28
  • @Kaldrr that's a good suggestion. Semantically, it represents what I want, but it is 8 bytes, so there is more complexity than with a ```binary_semaphore```. – Mark Wallace May 10 '22 at 07:30
  • @super I had not considered that. Yes, I think that's the perfect solution. – Mark Wallace May 10 '22 at 07:33
  • 1
    @MarkWallace The size of it really depends on the compiler/standard library you're using, both GCC and Clang have same size for it https://godbolt.org/z/Kaadhxf97. In MSVC the sizes indeed differ, as `latch` uses `atomic` which is 8 bytes, while `binary_semaphore` uses `atomic` making it 1 byte. You could compare to underlying operations to see if they differ in any meaningful way for your case. – Kaldrr May 10 '22 at 07:40
  • OT: I think that what you commented as _unnecessary_ is actually _necessary_. When you wait on a condition variable with some condition, then that condition must be updated under the same locked mutex. – Daniel Langr May 10 '22 at 08:17
  • 1
    @super I would just add that then you need a _busy waiting_ on that atomic flag (such as with a spin lock). – Daniel Langr May 10 '22 at 08:19
  • 1
    If you don't have C++20, you can also consider using the _void furute_ as described in Item 39 of the Mayer's Effective Modern C++ book. It's very simple. You have a shared `std::promise p;` variable, the thread 1 then calls `p.set_value();` and the thread 2 waits on `p.get_future().wait();`. – Daniel Langr May 10 '22 at 08:27
  • @DanielLangr ```std::promise``` is 24 bytes on MSVC, 24 bytes on gcc, and 8 bytes on clang. While this seems like the exact use case of ```std::promise```, looking at the compiler implementation, it appears that things are significantly more complicated under the hood than with a ```binary_semaphore```. – Mark Wallace May 10 '22 at 19:05
  • @MarkWallace There is a difference between _busy_ and _blocked_ waiting. You don't specify which one you want. Busy waiting can be implemented with a single byte in memory (even single bit). Blocked waiting is more complicated and the corresponding data structures require more memory space. – Daniel Langr May 11 '22 at 06:37
  • @DanielLangr You are right. I assume, then, that ```binary_semaphore``` busy waits? If so, that would mean ```std::promise``` is the preferred block-wait option and ```binary_semaphore``` is the preferred busy-wait option. – Mark Wallace May 11 '22 at 07:08
  • 1
    @MarkWallace I did some experiments and looked at some source code and it seems that the whole problem is much much more complicated. In my case with GCC/libstdc++, the barrier, latch, and semaphores from `std` actually implemented blocked waiting. However, I don't think this is guaranteed. For example, there is some conditional compilation in the source code for semaphores, which choose between POSIX semaphores (blocked waiting) and custom implementation. See https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/semaphore_base.h#L52. – Daniel Langr May 11 '22 at 09:24
  • @MarkWallace Anyway, you should not count on the bytesize of the corresponding data structure itself. If this data structure (such as `std::binary_semaphore`) internally uses some other mechanisms (such as POSIX semaphores), then there might by some additional data allocated and used internally. I am afraid that then there is no other way than to inspect the implementation source code. Note that even `std::promise` uses a shared state allocated out of the promise object. Sizeof promise then does not give you the complete memory overhead. – Daniel Langr May 11 '22 at 09:26
  • @MarkWallace My overall recommendation would be to use `std::latch` or `std::binary_semaphore` for (likely) blocked waiting until C++20, and `std::promise` since C++20. For busy waiting, just spin on `std::atomic`. Last note: this live demo shows that bytesize of latches and semaphores are the very same with libstdc++ and libc++ (I suspect they may be using futexes internally): https://godbolt.org/z/553rT4cG1. Barriers are more complicated. – Daniel Langr May 11 '22 at 09:48

0 Answers0