High-performance spinlock best practice

Question

I'm writing a realtime audio multithreaded application, and during the render cycle sometimes there will be no jobs on the work queue. I would like the thread to wait for jobs without triggering an OS context switch (which could be too damaging to performance), yet also "efficiently" in the sense that we probably don't need to be burning at 100% of the processor's capacity. I'm not quite sure what that means, but apparently Windows has a WaitForProcessor() function which indicates we are spinlocking and it can use a bit of the processor for hyperthreads, so it made me think that there could be a best practice here. It is a cross-platform application.

If there is no portable solution, I'm interested in platform-specific information for Windows, MacOS, Android, and iOS.

Currently it looks essentially like this:

// Variables shared between threads:
std::atomic<bool> done;
LockFreeFIFO<Job> workQueue;

// Thread process loop:
while (!done.load()) {
    Job job;
    if (workQueue.remove(&job)) {
        job.run();
    }
    else {
        // "semi-yield"?   What goes here?
    }
}

I believe that this is going to be platform-specific and that you have exited the zone of what can be done portably. — David Schwartz, Dec 18 '19 at 19:06
You might have to read years of comp.programming.threads to answer this. — A.K., Dec 18 '19 at 19:13
@A.K. well that's why I'm asking for help, because maybe somebody has done that already and has some wisdom to share. — luqui, Dec 18 '19 at 19:14
Punching "CPU relax" into your favorite search engine may be helpful. By the way, your bigger problem is not about not burning the CPU, it's about the fact that you care deeply about performance but are going to take the mother of all branch mispredictions the moment this thread has real work to do. Also, the busy core will draw of the CPU's power and heat budgets, potentially slowing other cores down, (All deeply platform specific.) — David Schwartz, Dec 18 '19 at 19:15
On Intel platforms, this seems to be what the PAUSE instruction (`_mm_pause()` intrinsic) is for. — Sebastian Redl, Dec 18 '19 at 19:19
*"efficiently" in the sense that we probably don't need to be burning at 100% of the processor's capacity* This is a contradiction unless you're talking about hyperthreading (SMT), in which case you want `_mm_pause()`. Or do you mean you hope there's a system call that will low-power sleep the core without letting another task pollute cache on it? — Peter Cordes, Dec 18 '19 at 19:23
You should avoid inventing synchronization primitives unless you are really sure the OS-provided ones do not work. Context switches are damaging when they are very frequent, e.g.hundreds of thousands per second. You might want to first quantify how much this actually is a problem — A.K., Dec 18 '19 at 19:23
@A.K., inserting an OS yield in the `else` causes latency spikes of 250% in my simple test case. I am following the wisdom in a [talk](https://www.youtube.com/watch?v=Q0vrQFyAdWI) by the implementors of Ableton to avoid locks and context switches within the rendering cycle. — luqui, Dec 18 '19 at 19:33

High-performance spinlock best practice

0 Answers0