Thread management in a game loop?

Question

I am in the middle of developing a game and came across the problem of multithreading. I already used multithreading successfully when loading resources. I did that by creating some threads at some point, assigned them functions, and waited for them to finish, while drawing a loading screen, pretty straightforward.

Now I want to create some threads, that can wait idle till they receive a function, when they do, solve that, then stay idle again. They must operate in a game loop, which is roughly like this (I came up with these function names just for easy visualization):

std::thread t0,t1;
while(gamerunning)
{
   UpdateGame();
   t0.receiveFunc( RenderShadow );
   t1.receiveFunc( RenderScene );
   WaitForThreadstoFinishWork();
   RenderEverything(); //Only draw everything if the threads finished (D3D11's Deferred Context rendering)
}
t0.Destroy();
t1.Destroy();

My rendering engine is working, and for the time being (for testing), I created threads in my game loop, which is a terrible way of even a quick test, because my rendering speed even slowed down. By the way, I am using C++11's library.

Long story short, I want to create threads before my game loop takes place, and use those in the game loop afterwards, hope someone can help me out. If it is an option, I would really want to stay away from the lower levels of threading, I just need the most straightforward way of doing this.

So, you want a pool of worker threads that can accept arbitrary tasks (functions) to be executed, right? Just use a thread-safe message queue that takes functions (`queue> + mutex + condition_variable` or something like this) along with as many threads as you need (typically, twice the number of CPU cores of the machine if you want to saturate your CPUs). — syam, Aug 21 '13 at 22:58
However I don't really get the second half of your comment, about that message queue for now, but I think you set me in the right direction with thread pool. I will look into it and then return. :) — János Turánszki, Aug 21 '13 at 23:02
Well this is a very common pattern so it shouldn't be too hard to find online. But if I wanted to explain it to you correctly it would take me a lot of time, which is why I only gave you hints. ;) If you really can't figure it out, just ping me here (add a @syam comment) and I'll see what I can do. :) — syam, Aug 21 '13 at 23:09
Note that this pattern is usually called Producer/Consumer. This may help you find the correct resources, along with the other keywords I already gave you. :) — syam, Aug 21 '13 at 23:12
@syam I have read into the problem and my fears came true, because it seems like a pretty heavy topic. I haven't even used mutexes, or anything like that, I just ensured that I do not call anything which is not thread-safe. Do you think it is any good if I create threads every frame (60 frames per second for example) and then destroying those when they finished the given function? I have a set number of 7 functions which I want to run on 7 separate threads every frame, I hope this can simplify it, because what I have read so far concentrates on managing an arbitary number of threads? — János Turánszki, Aug 22 '13 at 14:17
Starting and stopping threads is quite an expensive operation, so I certainly wouldn't do that 420 (7*60) times per second. However, if your set of functions is fixed then you can indeed simplify the stuff and get rid of message queues altogether. However you'll still have to use a bool/mutex/condition_variable triplet to wake up the threads. Just give me a few minutes to write an example. — syam, Aug 22 '13 at 14:44

syam · Accepted Answer · 2013-08-22T17:06:28.547

Following your most recent comments, here is an example implementation of a thread that wakes up on demand, runs its corresponding task and then goes back to sleep, along with the necessary functions to manage it (wait for task completion, ask for shutdown, wait for shutdown).

Since your set of functions is fixed, all you'll have left to do is to create as much threads as you need (ie. 7, probably in a vector), each with its own corresponding task.

Note that once you remove the debugging couts there's little code left, so I don't think there is a need to explain the code (it's pretty self-explanatory IMHO). However don't hesitate to ask if you need explanations on some details.

class TaskThread {
public:
    TaskThread(std::function<void ()> task)
      : m_task(std::move(task)),
        m_wakeup(false),
        m_stop(false),
        m_thread(&TaskThread::taskFunc, this)
    {}
    ~TaskThread() { stop(); join(); }

    // wake up the thread and execute the task
    void wakeup() {
        auto lock = std::unique_lock<std::mutex>(m_wakemutex);
        std::cout << "main: sending wakeup signal..." << std::endl;
        m_wakeup = true;
        m_wakecond.notify_one();
    }
    // wait for the task to complete
    void wait() {
        auto lock = std::unique_lock<std::mutex>(m_waitmutex);
        std::cout << "main: waiting for task completion..." << std::endl;
        while (m_wakeup)
          m_waitcond.wait(lock);
        std::cout << "main: task completed!" << std::endl;
    }

    // ask the thread to stop
    void stop() {
        auto lock = std::unique_lock<std::mutex>(m_wakemutex);
        std::cout << "main: sending stop signal..." << std::endl;
        m_stop = true;
        m_wakecond.notify_one();
    }
    // wait for the thread to actually be stopped
    void join() {
        std::cout << "main: waiting for join..." << std::endl;
        m_thread.join();
        std::cout << "main: joined!" << std::endl;
    }

private:
    std::function<void ()> m_task;

    // wake up the thread
    std::atomic<bool> m_wakeup;
    bool m_stop;
    std::mutex m_wakemutex;
    std::condition_variable m_wakecond;

    // wait for the thread to finish its task
    std::mutex m_waitmutex;
    std::condition_variable m_waitcond;

    std::thread m_thread;

    void taskFunc() {
        while (true) {
            {
                auto lock = std::unique_lock<std::mutex>(m_wakemutex);
                std::cout << "thread: waiting for wakeup or stop signal..." << std::endl;
                while (!m_wakeup && !m_stop)
                    m_wakecond.wait(lock);
                if (m_stop) {
                    std::cout << "thread: got stop signal!" << std::endl;
                    return;
                }
                std::cout << "thread: got wakeup signal!" << std::endl;
            }

            std::cout << "thread: running the task..." << std::endl;
            // you should probably do something cleaner than catch (...)
            // just ensure that no exception propagates from m_task() to taskFunc()
            try { m_task(); } catch (...) {}
            std::cout << "thread: task completed!" << std::endl;

            std::cout << "thread: sending task completed signal..." << std::endl;
            // m_wakeup is atomic so there is no concurrency issue with wait()
            m_wakeup = false;
            m_waitcond.notify_all();
        }
    }
};

int main()
{
    // example thread, you should really make a pool (eg. vector<TaskThread>)
    TaskThread thread([]() { std::cout << "task: running!" << std::endl; });

    for (int i = 0; i < 2; ++i) { // dummy example loop
      thread.wakeup();
      // wake up other threads in your thread pool
      thread.wait();
      // wait for other threads in your thread pool
    }
}

Here's what I get (actual order varies from run to run depending on thread scheduling):

main: sending wakeup signal...
main: waiting for task completion...
thread: waiting for wakeup or stop signal...
thread: got wakeup signal!
thread: running the task...
task: running!
thread: task completed!
thread: sending task completed signal...
thread: waiting for wakeup or stop signal...
main: task completed!
main: sending wakeup signal...
main: waiting for task completion...
thread: got wakeup signal!
thread: running the task...
task: running!
thread: task completed!
thread: sending task completed signal...
thread: waiting for wakeup or stop signal...
main: task completed!
main: sending stop signal...
main: waiting for join...
thread: got stop signal!
main: joined!

This looks awesome, I can not thank you enough! I really needed a lightweight solution like this. :) — János Turánszki, Aug 22 '13 at 15:56
You're welcome. Note that I updated the code, there was a concurrency issue with `wait()` since it uses a different mutex than the rest, so I made `m_wakeup` a `std::atomic` to fix it (along with a few other changes to avoid the main thread stepping on the task thread's toes). — syam, Aug 22 '13 at 16:08
I tried it out, works like a charm, and now I tried your corrected code, which freezez up my program completely after a few seconds of running just when waking and waiting). — János Turánszki, Aug 22 '13 at 16:54
Ouch... The fix seems worse than the original problem. :( Do you perchance call either `wakeup()` or `stop()` from inside your tasks? Anyway, I think I know where this comes from: you should limit the scope of the lock in `taskFunc()` like it was before (see updated code). Beware though, calling `wakeup()` while a task is already running will be a no-op (ie. it won't start another task when the current one is finished). — syam, Aug 22 '13 at 17:08
Yes I called them. First code works, then just replaced the class and that won't. And by the way, doing a bit of a performance testing, it is much faster than creating a thread in every frame, but the same framerate as rendering on a single thread. Now I get it this topic is not about rendering at all, but I am wondering how I could check if the operating system runs the threads separately? (just if you have the time, you already helped so much :) ) — János Turánszki, Aug 22 '13 at 17:18
Just to be sure: are you calling first all your threads' `wakeup()` and then only start calling the `wait()` functions? Also, how many CPU cores do you have on your machine? If you only have one, no amount of threading will help when the tasks are CPU-bound (as opposed to IO-bound). To check if the tasks actually run in parallel, I'd use whatever task manager at hand and look at the CPU usage per core, and check how many cores get busy when you run your tasks. — syam, Aug 22 '13 at 17:35
Also, make sure to remove or comment out the debugging `cout` lines, they slow down the program **a lot** (I only put them to help you understand how the code works, but they must be removed in performance-critical code). — syam, Aug 22 '13 at 17:38
Yes, I call wake then wait. Sadly, I am running windows 8 where I can't see the cpu usage per core in the task manager. (and I have a core i7 with 8 cores) — János Turánszki, Aug 22 '13 at 18:28
Try [ProcessExplorer](http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx), IIRC it has a per-core monitor somewhere. — syam, Aug 22 '13 at 18:41
I tried that, and by the way managed turned on display for multiple cores on windows 8, and it seems it does not increase clock frequencies for other cores (and it does when I create threads every frame). However, it guess it might even be because of the worker functions (they only gather the draw calls which will be executed on the GPU in the main thread when they all returned, so it is only CPU work). Oh and my "threadpool" is like this: std::vector could it be that something is off with the vector? — János Turánszki, Aug 22 '13 at 19:24
Concerning the vector, I would store `unique_ptr`s instead of raw pointers in order to avoid managing the memory manually, but that's a detail and has nothing to do with your problem. As to your performance issue, I can easily saturate all my CPU cores using the very same TaskThread so the problem is likely to come from your tasks themselves. — syam, Aug 22 '13 at 21:44
I will try to simulate heavier load then. Again, thanks for the help! — János Turánszki, Aug 22 '13 at 22:30
Okay, it runs in parallel, I am just not gaining the performance I expected batching up the draw calls. At this point it seems my GPU is the bottleneck. — János Turánszki, Aug 23 '13 at 07:33

Thread management in a game loop?

1 Answers1