I wrote a thread pool with as many threads as I have (spare cores), to avoid context switching. Whenever a new task needs to be executed, that task is added to a lock-free ring buffer for threads of the thread pool to consume. Each time a new task is added I currently call sem_post.
My benchmarks shows that the call to sem_post
takes 10 microseconds when there are threads waiting for the semaphore. Some calls only take 50 ns (which probably means that entirely in user space it could be established that there were no threads that could be woken up), but also 350 +/- 30 nanoseconds is a frequently seen value.
This question is about the case where one or more threads have/had nothing to do and are waiting on the semaphore.
I am not happy at all that in that case the caller (that tries to wake up a new thread) spends 10 microseconds in sem_post
.
Isn't there a faster way (from the view point of the caller) to wake up a sleeping thread? I can live with a delay of 10 microseconds until that new thread finally starts running, but the thread that does the waking up should not be delayed as much.
Related questions that I could find (but do not answer my question) are
Note that a semaphore seems to be implemented on top of a futex. I'd think that a futex is the fastest possible way on Linux? Perhaps it is faster to use a signal or an interrupt?