4

I'm writing a C++ ThreadPool implantation and using pthread_cond_wait in my worker's main function. I was wondering how much time will pass from signaling the condition variable until the thread/threads waiting on it will wake up. do you have any idea of how can I estimate/calculate this time?

Thank you very much.

  • 1
    It will be OS (and OS version) specific. The time it takes for the OS scheduler to wake a thread is outside the scope of C++. – Richard Critten Aug 20 '17 at 16:57
  • @RichardCritten Not to mention system board and CPU specific, and specific to the system load and priority configuration thereof, plus some real-time variables to perturb the least significant digits in the calculation, like the arrival of various interrupts, and how hot/cold are the caches and whatnot. – Kaz Aug 21 '17 at 02:12

2 Answers2

10

It depends, on the cost of a context switch

  1. on the OS,
  2. The CPU
  3. is it thread or a different process
  4. the load of the machine
  5. Is the switch to same core as it last ran on
  6. what is the working set size
  7. time since it last ran

Linux best case, i7, 1100ns, thread in same process, same core as it ran in last, ran as the last thread, no load, working set 1 byte.

Bad case, flushed from cache, different core, different process, just expect 30µs of CPU overhead.

Where does the cost go:

  1. Save last process context 70-400 cycles,
  2. load new context 100-400 cycles
  3. if different process, flush TLB, reload 3 to 5 page walks, which potentially could be from memory taking ~300 cycles each. Plus a few page walks if more than one page is touched, including instructions and data.
  4. OS overhead, we all like the nice statistics, for example add 1 to context switch counter.
  5. Scheduling overhead, which task to run next
  6. potential cache misses on new core ~12 cycles per cache line on own L2 cache, and downhill from there the farther away the data is and the more there is of it.
Surt
  • 15,501
  • 3
  • 23
  • 39
  • Sounds about right. On Windows, making a 'not-much-data' thread running that has been waiting for some time on an event takes 3-7us, (assuming there is a core free to run it 'immediately'). – Martin James Aug 20 '17 at 21:29
  • This is one of the cases where AMD's extra TLB page walkers might reduce the time slightly, ie. more parallel TLB cache misses. – Surt Sep 25 '22 at 20:34
0

As mentioned time for condition variable to react depends on many factors. One option is to actually measure it: you may start a thread that waits on a condition variable. Then, another thread that signals the condition variable takes timestamp right before signaling the variable. The thread that waits on the variable also takes timestamp the moment it wakes up. Simple as that. This way you may have rough approximation about time it takes for the thread to notice the signaled condition.

#include <mutex>
#include <condition_variable>
#include <thread>
#include <chrono>
#include <stdio.h>

typedef std::chrono::time_point<std::chrono::high_resolution_clock> timep;

int main()
{
    std::mutex mx;
    std::condition_variable cv;
    timep t0, t1;
    bool done = false;

    std::thread th([&]() {
        while (!done)
        {
            std::unique_lock lock(mx);
            cv.wait(lock);
            t1 = std::chrono::high_resolution_clock::now();
        }
    });

    for (int i = 0; i < 25; ++i) // measure 25 times
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(10));
        t0 = std::chrono::high_resolution_clock::now();
        cv.notify_one();
        std::this_thread::sleep_for(std::chrono::milliseconds(10));
        std::unique_lock lock(mx);
        printf("test#%-2d: cv reaction time: %6.3f micro\n", i,
            1000000 * std::chrono::duration<double>(t1 - t0).count());
    }
    {
        std::unique_lock lock(mx);
        done = true;
    }
    cv.notify_one();
    th.join();
}

Try it on coliru, it produced this output:

test#0 : cv reaction time: 50.488 micro
test#1 : cv reaction time: 55.057 micro
test#2 : cv reaction time: 53.765 micro
test#3 : cv reaction time: 50.973 micro
test#4 : cv reaction time: 51.015 micro
test#5 : cv reaction time: 57.166 micro
and so on...

On my windows 11 laptop I got values roughly 5-10x faster (5-10 microseconds).

Pavel P
  • 15,789
  • 11
  • 79
  • 128