0

I have a multi-thread application. Each thread initializes a struct data type in its own local storage. Some elements are being added to the vectors inside the struct type variables. At the end of the program, I would like to iterate through these thread local storages and add all the results together. How can I iterate through the thread specific pointer so that I can add all the results from the multi threads together ?

Thanks in advance.

boost::thread_specific_ptr<testStruct> tss;

size_t x = 10;

void callable(string str, int x) {
    if(!tss.get()){
        tss.reset(new testStruct);
        (*tss).xInt.resize(x, 0);
    }
    // Assign some values to the vector elements after doing some calculations
}

Example:

#include <iostream>
#include <vector>
#include <boost/thread/mutex.hpp>
#include <boost/thread/tss.hpp>
#include <boost/thread.hpp>
#include <boost/asio.hpp>
#include <boost/bind.hpp>

#define NR_THREAD 4
#define SAMPLE_SIZE 500

using namespace std;

static bool busy = false;

struct testStruct{
    vector<int> intVector;
};

boost::asio::io_service ioService;
boost::thread_specific_ptr<testStruct> tsp;
boost::condition_variable cond;
boost::mutex mut;

void callable(int x) {
    if(!tsp.get()){
        tsp.reset(new testStruct);
    }

    (*tsp).intVector.push_back(x);

    if (x + 1 == SAMPLE_SIZE){
        busy = true;
        cond.notify_all();
    }
}

int main() {
    boost::thread_group threads;
    size_t (boost::asio::io_service::*run)() = &boost::asio::io_service::run;
    boost::asio::io_service::work work(ioService);

    for (short int i = 0; i < NR_THREAD; ++i) {
        threads.create_thread(boost::bind(run, &ioService));
    }

    size_t iterations = 10;
    for (int i = 0; i < iterations; i++) {
        busy = false;

        for (short int j = 0; j < SAMPLE_SIZE; ++j) {
            ioService.post(boost::bind(callable, j));
        }

        // all threads need to finish the job for the next iteration
        boost::unique_lock<boost::mutex> lock(mut);
        while (!busy) {
            cond.wait(lock);
        }
        cout << "Iteration: " << i << endl;
    }

    vector<int> sum(SAMPLE_SIZE, 0);    // sum up all the values from thread local storages

    work.~work();
    threads.join_all();

    return 0;
}
serhatg
  • 312
  • 1
  • 3
  • 15
  • You just write that logic yourself. Looks like a run-of-the-mill consolidation step. Make the function return non-void instead and sum the values of futures, e.g. – sehe Jun 16 '15 at 11:38
  • Each thread calls the function N times where N is not fixed and can go up to millions of times. Let's say I have 8 threads and each thread gets ~1m jobs, I think it's not a good solution to return values in that case. – serhatg Jun 16 '15 at 12:04
  • Why not? Just return the value that otherwise needs to be gotten from the TLS. Or, you use a promise if you don't want/need to join the thread. – sehe Jun 16 '15 at 14:15
  • Well, the main idea is to improve the performance, that's why I would prefer not to return that value after each iteration, but to do the calculations in inside each thread and at the end just to sum up a few vectors together in single core. All the threads are joined after all the iterations are done. – serhatg Jun 16 '15 at 14:30
  • Mmm. Just repeating your assumptions doesn't make them more useful. I _get_ what you mean. I'm trying to tell you you're barking up the wrong tree. Make a SSCCE and I'll show you. (Multiple versions if you want) – sehe Jun 16 '15 at 14:33
  • Thanks @sehe. I have added a short example. – serhatg Jun 16 '15 at 14:49
  • Ow. That sample shows a _lot_ more complexity than (a) required (b) hinted at in the question. Also, `work.~work()`? I'll see what I can come up with tonight – sehe Jun 16 '15 at 15:53
  • I am calling the work destructor explicitly so that the io service will stop only when there are no jobs to post and all the jobs are done. (should have named it a bit different maybe, sorry) The actual program is more complex than this. That's why I have been looking a way of being able to iterate through thread local storages . Thanks for your time. – serhatg Jun 16 '15 at 16:15
  • You can't legally call the destructor on an object with automatic storage duration like that. Anyways, I'm still at work, later. – sehe Jun 16 '15 at 16:16

1 Answers1

0

So, after I haven given some thought to this issue, I have come up with such a solution:

void accumulateTLS(size_t idxThread){

    if (idxThread == nr_threads)   // Suspend all the threads till all of them are called and waiting here
    {
        busy = true;
    }

    boost::unique_lock<boost::mutex> lock(mut);
    while (!busy)
    {
        cond.wait(lock);
    }

    // Accumulate the variables using thread specific pointer

    cond.notify_one();
}

With boost io_service, the callable function can be changed after the threads are initialized. So, after I have done all the calculations, I am sending jobs(as many as the number of threads) to the io service again with callable function accumulateTLS(idxThread). The N jobs are sent to N threads and the accumulation process is done inside accumulateTLS method.

P.S. instead of work.~work(), work.reset() should be used.

serhatg
  • 312
  • 1
  • 3
  • 15