0

I have some code that works like this:

std::queue<int> a_queue;
bool exit = false;

void MainThreadFunc(int somedata)
{
    a_queue.push(somedata);
}

void WorkerThreadFunc()
{
    while (true)
    {
        if (exit)
            return;

        while (a_queue.empty());

        DoSomethingWithData(a_queue.front());
        a_queue.pop();
    }
}

The problem is that I get really high CPU usage, which appear to be a result of the spinlock when there's nothing for the worker to do. I attempted to use a mutex, but I it'd require the main thread locking it when there's nothing in the queue (which can't happen, of course). What alternatives are there to prevent this?

user112513312
  • 459
  • 1
  • 7
  • 15
  • 1
    Since you don't want to use synchronization objects, use sleep `while(a_queue.empty()) std::this_thread::sleep_for(std::chrono::milliseconds(1));` Are you sure that condition variable isn't suitable for you? – AlexStepanov Nov 14 '15 at 07:58
  • Can you put a wrapper around the queue that handles the synchronization? – Weak to Enuma Elish Nov 14 '15 at 08:03
  • @AlexStepanov This is for a game project, which has a "deadline" of about 16ms between frames, so waiting 1ms isn't an option. – user112513312 Nov 14 '15 at 08:04
  • 2
    A condition variable/monitor is normally ideal for this kind of case -- if you have super tight needs where even that is ill-suited then it's typically better to just burn cycles with a CAS loop/spinlock. It's pretty tough to satisfy the highest latency demands and obsess equally about CPU utilization simultaneously. –  Nov 14 '15 at 08:07
  • 1
    With such a deadline you cannot rely on either synchronization mechanism, or sleep. That's because they all yield their execution to the system thread scheduler, and you cannot determine when you will be executed next time. Only a busyspin loop can help here. – AlexStepanov Nov 14 '15 at 08:11
  • 2
    If it's for a game, I think unreasonable CPU usage is reasonable. – Weak to Enuma Elish Nov 14 '15 at 08:11
  • 1
    You need a lock anyways, otherwise concurrent modification of the queue will cause undefined behavior. Or an alternative data structure (e.g. "lock-free queue"). –  Nov 14 '15 at 09:05
  • Is 5-10us an option? That it typical inter-thread comms latency with a condvar, event or semaphore. A busy-loop is typically a disaster for CPU cycles and memory-bandwidth. – Martin James Nov 14 '15 at 15:30
  • The only time that a spinlock is justified is when the expected lock time is very short, eg. when used as a critical-section for pushing/popping a pointer from a queue. If you often have to wait for an extended period, (longer than, say 10us), then a spinlock just sucks, (your CPU cycles and memory bandwidth away). – Martin James Nov 14 '15 at 15:40

3 Answers3

1

The code below is what I learnt before somewhere else. It is a blocking queue implement. Threads can put elements to the queue safely, and if a thread attempts to take element from the queue when it is empty, it will be blocked until some other thread put elements to the queue. Hope it can help you

#include <queue>
#include <cassert>
#include <mutex>
#include <condition_variable>
#include <thread>
template<typename T>
class BlockingQueue
{
private:
    std::mutex _mutex;
    std::condition_variable _condvar;
    std::queue<T> _queue;
public:
    BlockingQueue(): _mutex(),_condvar(),_queue()
    {

    }
    BlockingQueue(const BlockingQueue& rhs) = delete;
    BlockingQueue& operator = (const BlockingQueue& rhs) = delete;

    void Put(const T& task)
    {
        {
            std::lock_guard<std::mutex> lock(_mutex);
            _queue.push(task);
        }
        _condvar.notify_all();
    }

    T Take()
    {
        std::unique_lock<std::mutex> lock(_mutex);
        _condvar.wait(lock,[this]{return !_queue.empty(); });
        assert(!_queue.empty());
        T front(std::move(_queue.front()));
        _queue.pop();

        return front;
    }

};
Wang Weiyi
  • 86
  • 5
0

Given the stated requirements where the cost of thread scheduler/context switch is too expensive, typically the best bet is to simply burn cycles as you do now to meet the tightest latency demands.

An atomic CAS spin/busy loop is basically a polling method, and as is commonly associated with polling, it has a tendency to hog CPU. Yet it doesn't pay the cost of the scheduler which you want to avoid here with that tight 16ms deadline to do everything you need to do and deliver a frame. It's typically your best bet to meet this kind of deadline.

With games where you have a lively world and constantly-animated content, there typically aren't lengthy idle periods where nothing is happening. As a result, it's generally considered quite acceptable for a game to be constantly utilizing CPU.

Probably a more productive question given your requirements is not how to reduce CPU utilization so much as to make sure the CPU utilization is going towards a good purpose. Typically a concurrent queue can offer a lock-free, non-blocking query to check to see if the queue is empty, which you already seem to have given this line:

while (a_queue.empty());

You might be able to sneak in some things to compute here while the queue is empty. This way you're not burning up cycles merely busy-waiting on the queue to become non-empty.

Again, typically the ideal answer to your question would involve a condition variable (which does depend on context switches and threads being woken up). It will typically be the fastest way to put the thread to sleep (reducing the CPU utilization) and have it woken up at the desired time, but given the tight latency requirements, it might be best to forget about the CPU utilization and focus more on making sure it's going towards a good purpose.

  • 'Given the stated requirements where the cost of thread scheduler/context switch is too expensive,' - where does it say that? It mentions 16ms, which is an eternity compared with typical inter-thread comms kernel signaling latencies of 5-10us. – Martin James Nov 14 '15 at 15:34
  • Just from the discussion before -- there was mention that even condition variables were undesirable. I'm assuming for such a tight latency demand that whatever overhead of a condition variable/monitor would multiply prior to delivering a frame (though maybe it is better to just use one and time it/see). In any case, I just wanted to offer that if even a condition variable is too slow, then the typical alternative is to just burn cycles with a busy loop. It seems impossible to get that obsessive about both latency and lower CPU usage at the same time. –  Nov 14 '15 at 23:42
  • Though I'm 99% in agreement with you. I'd prefer to just use the condition variable and not worry about it until there's a measured reason to do so. The 1% might just be that I think there is one other case where a spinloop might be justified, and that's when your notion of speed isn't based on averages but on consistency/stability, as might be common in a game where we don't want any remote possibility of stutter -- "consistent speed" rather than simply "fast". Though 5-10 us is really getting to an unhealthy obsession territory -- I suppose the only point I wanted to offer is that we... –  Nov 14 '15 at 23:56
  • ... typically can't have it both ways. If the overhead of a context switch is "too expensive", then the alternative is the spin loop which burns cycles. –  Nov 14 '15 at 23:56
  • Not only does it burn CPU cycles, it also burns memory-bandwidth, adversely affecting memory shared with other cores:( – Martin James Nov 28 '15 at 08:22
  • @MartinJames Yeah, my normal suggestion would be try it and see. If the desire to avoid a condition variable is superstition and it wasn't actually tried yet, my general suggestion would be try it and see, measure. CAS loops tend to be better saved as an optimization detail for cases that benefit from it, since far more cases don't. I just noticed in the early comment interchange that sync objects and context switches were undesirable here for some reason, at which point this becomes the main alternative. –  Nov 28 '15 at 15:42
0

std::queue is not thread-safe. As @Hurkyl posted, you need a lock if you want to use std::queue.

As coded, if multiple threads are waiting for data in the queue, they could all act on the same value from a_queue.fronf(), then call a_queue.pop() a number of times unrelated to the number of values pushed on the queue.

See Thread safety for STL queue

Community
  • 1
  • 1
Andrew Henle
  • 32,625
  • 3
  • 24
  • 56