c++ multithreading and affinity

Question

I'm writing a simple thread pool for my application, which I test on dual-core processor. Usually it works good, but i noticed that when other processes are using more than 50% of processor, my application almost halts. This made me curious, so i decided to reproduce this situation and created auxiliary application, which simply runs infinite loop (without multithreading), taking 50% of processor. While auxiliary one is running, multithreaded application almost halts, as before (processing speed falls from 300-400 tasks per second to 5-10 tasks per second). But when I changed process affinity of my multithreaded program to use only one core (auxiliary still uses both), it started working, of course using at most 50% processor left. When I disabled multithreading in my application (still processing the same tasks, but without thread pool), it worked like charm, without any slow down from auxiliary, which was still running (and that's how two applications should behave when running on two cores). But when I enable multithreading, the problem comes back.

I've made special code for testing this particular ThreadPool:

header

#ifndef THREADPOOL_H_
#define THREADPOOL_H_

typedef double FloatingPoint;

#include <queue>
#include <vector>

#include <mutex>
#include <atomic>
#include <condition_variable>
#include <thread>

using namespace std;

struct ThreadTask
{
    int size;

    ThreadTask(int s)
    {
        size = s;
    }
    ~ThreadTask()
    {
    }
};

class ThreadPool
{
protected:
    queue<ThreadTask*> tasks;
    vector<std::thread> threads;
    std::condition_variable task_ready;
    std::mutex variable_mutex;
    std::mutex max_mutex;

    std::atomic<FloatingPoint> max;
    std::atomic<int> sleeping;
    std::atomic<bool> running;

    int threads_count;

    ThreadTask * getTask();
    void runWorker();
    void processTask(ThreadTask*);
    bool isQueueEmpty();
    bool isTaskAvailable();
    void threadMethod();
    void createThreads();
    void waitForThreadsToSleep();
public:
    ThreadPool(int);
    virtual ~ThreadPool();

    void addTask(int);
    void start();
    FloatingPoint getValue();

    void reset();
    void clearTasks();
};

#endif /* THREADPOOL_H_ */

and .cpp

#include "stdafx.h"
#include <climits>
#include <float.h>

#include "ThreadPool.h"

ThreadPool::ThreadPool(int t)
{
    running = true;
    threads_count = t;
    max = FLT_MIN;
    sleeping = 0;

    if(threads_count < 2)                                       //one worker thread has no sense
    {
        threads_count = (int)thread::hardware_concurrency();    //default value

        if(threads_count == 0)  //in case it fails ('If this value is not computable or well defined, the function returns 0')
            threads_count = 2;
    }

    printf("%d worker threads\n", threads_count);
}

ThreadPool::~ThreadPool()
{
    running = false;

    reset();                    //it will make sure that all worker threads are sleeping on condition variable
    task_ready.notify_all();    //let them finish in natural way

    for (auto& th : threads)
        th.join();
}

void ThreadPool::start()
{
    createThreads();
}

FloatingPoint ThreadPool::getValue()
{
    waitForThreadsToSleep();

    return max;
}

void ThreadPool::createThreads()
{
    threads.clear();

    for(int i = 0; i < threads_count; ++i)
        threads.push_back(std::thread(&ThreadPool::threadMethod, this));
}

void ThreadPool::threadMethod()
{
    while(running)
        runWorker();
}

void ThreadPool::runWorker()
{
    ThreadTask * task = getTask();
    processTask(task);
}

void ThreadPool::processTask(ThreadTask * task)
{
    if(task == NULL)
        return;

    //do something to simulate processing

    vector<int> v;

    for(int i = 0; i < task->size; ++i)
        v.push_back(i);

    delete task;
}

void ThreadPool::addTask(int s)
{
    ThreadTask * task = new ThreadTask(s);

    std::lock_guard<std::mutex> lock(variable_mutex);
    tasks.push(task);

    task_ready.notify_one();
}

ThreadTask * ThreadPool::getTask()
{
    std::unique_lock<std::mutex> lck(variable_mutex);

    if(tasks.empty())
    {
        ++sleeping;
        task_ready.wait(lck);
        --sleeping;
        if(tasks.empty())   //in case of ThreadPool being deleted (destructor calls notify_all), or spurious notifications
            return NULL;    //return to main loop and repeat it
    }

    ThreadTask * task = tasks.front();
    tasks.pop();

    return task;
}

bool ThreadPool::isQueueEmpty()
{
    std::lock_guard<std::mutex> lock(variable_mutex);

    return tasks.empty();
}

bool ThreadPool::isTaskAvailable()
{
    return !isQueueEmpty();
}

void ThreadPool::waitForThreadsToSleep()
{
    while(isTaskAvailable())
        std::this_thread::yield();  //wait for all tasks to be taken
    while(true) //wait for all threads to finish they last tasks
    {
        if(sleeping == threads_count)
            break;

        std::this_thread::yield();
    }
}

void ThreadPool::clearTasks()
{
    std::unique_lock<std::mutex> lock(variable_mutex);

    while(!tasks.empty()) tasks.pop();
}

void ThreadPool::reset()    //don't call this when var_mutex is already locked by this thread!
{
    clearTasks();

    waitForThreadsToSleep();

    max = FLT_MIN;
}

how it's tested:

ThreadPool tp(2);
tp.start();

int iterations = 1000;
int task_size = 1000;

for(int j = 0; j < iterations; ++j)
{
    printf("\r%d left", iterations - j);

    tp.reset();
    for(int i = 0; i < 1000; ++i)
        tp.addTask(task_size);

    tp.getValue();  
}


return 0;

I've build this code with mingw with gcc 4.8.1 (from here) and Visual Studio 2012 (VC11) on Win7 64, both on debug configuration.

Two programs build with mentioned compilers behave totally different.

a) program build with mingw works much faster than one build on VS, when it can take whole processor (system shows almost 100% CPU usage by this process, so i don't think mingw is secretly setting affinity to one core). But when i run auxiliary program (using 50% of CPU), it slows down greatly (about several dozen times). CPU usage in this case is about 50%-50% for main program and auxiliary one.

b) program build with VS 2012, when using whole CPU, is even slower than a) with slowdown (when i set task_size = 1, their speeds were similiar). But when auxiliary is running, main program even takes most of CPU (usage is about 66% main - 33% aux) and resulting slow down is barely noticeable.

When set to use only one core, both programs speed up noticeable (about 1.5 - 2 times), and mingw one stops being vulnerable to competition.

Well, now i don't know what to do. My program behaves differently when build by two different toolsets. Is this a flaw in my code (which is suppose is true), or something to do with compilers having problems with c++11 ?

It's not clear exactly what you are doing in the thread itself. Or what your "load the system" task is doing. — Mats Petersson, Sep 18 '13 at 20:11
can you reproduce the problem on a simplier/minimal example? — brunocodutra, Sep 18 '13 at 20:25
Try this with Visual Studio (minus your math library). I have no idea how your `std::thread` is implemented, but last I checked, mingw doesn't support pthreads. Another possibility is that mingw sets it's affinity to one processor, and that gets propagated to your process, if you're running it from mingw. — Collin Dauphinee, Sep 18 '13 at 22:49
I modified post to adjust it to your comments. Code is also simplified slightly, so now you see whole code in use. — Zwierzolak, Sep 19 '13 at 20:03

c++ multithreading and affinity

0 Answers0