15

Suppose we have the following code which counts the number of times something occurs:

int i=0;
void f() {
   // do stuff  . . .
   if(something_happens) ++i;
}

int main() {
    std::vector<std::thread> threads;
    for(int j = 0; j< std::thread::hardware_concurrency(); ++j) {
        threads.push_back(std::thread(f));
    }

    std::for_each(threads.begin(), threads.end(), std::mem_fn(&std::thread_join));
    std::cout << "i = " << i << '\n';
}

As it stands there is a clear race condition on i. Using C++11, what is (1) the easiest method for eliminating this race condition, and (2) the fastest method?, preferably without using mutexes. Thanks.

Update: Using the comment to use atomics, I got a working program which compiles under the Intel Compiler, version 13:

#include <iostream>
#include <thread>
#include <vector>
#include <atomic>
#include <algorithm>

std::atomic<unsigned long long> i = 0;

void f(int j) {
    if(j%2==0) {
        ++i;
    }  
}

int main() {
    std::cout << "Atomic i = " << i << "\n";
    int numThreads = 8; //std::thread::hardware_concurrency() not yet implemented by Intel
    std::vector<std::thread> threads;
    for(int k=0; k< numThreads; ++k) {
        threads.push_back(std::thread(f, k));
    }

    std::for_each(threads.begin(), threads.end(), std::mem_fn(&std::thread::join));
        std::cout << "Atomic i = " << i << "\n";
    }
Uyghur Lives Matter
  • 18,820
  • 42
  • 108
  • 144
user14717
  • 4,757
  • 2
  • 44
  • 68
  • 1
    Atomics. If you're using GCC, search for `__sync builtins`, otherwise I do not know. –  Feb 17 '14 at 17:09
  • Do you need a meaningful count-in-progress, or only the final count after all threads ended? – smci Feb 17 '14 at 23:15

4 Answers4

16

You might want to look into atomic types. You can access them without a need for a lock/mutex.

DarkDust
  • 90,870
  • 19
  • 190
  • 224
  • +1 for mentioning C++11 ``, missed that the question is tagged C++11 :D –  Feb 17 '14 at 17:11
3

We solved a similiar problem by declaring an array[nThreads], we then gave every thread an id ranging from 0-n the thread then can write safely at its position in the array. Then you can just sum the array to get the total sum. This however is only helpful as long as you dont need to sum the array before all the threads are dead.

To be even more efficient we had a local counter at each thread which we then before the thread died appended to the array.

example (pseudo code:)

counter[nThreads];

thread(int id)
{
    // Do stuff
    if(something happened)
       counter[id]++;   
}

or

counter[nThreads];

thread(int id)
{
    int localcounter = 0;
    //Do stuff
    if(something happened)
       localcounter++;   

    //Thread is about to die
    counter[id] = localcounter;
}
Simon Karlsson
  • 4,090
  • 22
  • 39
  • 1
    +1, nice concept: this might be handy for more complex data. It's like a simple map-and-reduce. – DarkDust Feb 17 '14 at 18:45
  • @DarkDust beat me to saying this is just map-reduce – smci Feb 17 '14 at 23:15
  • you fell into the classical trap : you didn't pad your array between each variable by at least an amout equivalent to the cache line (128 bytes). Because of this, you didn't notice it, but the hardware accesses were interlocked by MESI protocol. – v.oddou Feb 18 '14 at 05:56
  • @v.oddou This is why we made the second approach. But thats a really nice idea! – Simon Karlsson Feb 18 '14 at 07:21
1

You can use the InterlockedIncrement function.

Many functions to mutate variables in atomic ways are documented on MSDN under the banner of Synchronization Functions - they may be useful to you.

Rich O'Kelly
  • 41,274
  • 9
  • 83
  • 114
1

it is not just a race condition, you could have no communication of the actual i value at all between threads if your compiler decides so.

obviously atomic is the good way. mutex is also a good way, when you don't have collisions they are as fast as atomics. They get slower only when they actually need to make the kernel fiddle with sleeping and ready thread queues. What can get taxy is if the wait signal doesn't use a condition variable in which case you might have to wait for a schedule kernel tick for your ready threads to be running, which can be very long (30ms).

atomics will get you optimality though, and may even be easier to maintain than condition variables thanks to not having to care about spurious events and notify_one versus notify_all etc.

If you check C++11-made STL's shared_ptr base classes, they contain a base_count or (base_shared_count or something) that works exactly like you need. You may also check the new boost::shared_count implementation if you will.

v.oddou
  • 6,476
  • 3
  • 32
  • 63