I'm experimenting with C++ standard threads. I wrote a small benchmark to test performance overhead and overall throughput. The principle it to run in one or several threads a loop of 1 billion iterations, making small pause from time to time.
In a first version I used counters in shared memory (i.e. normal variables). I exepected the following output:
Sequential 1e+009 loops 4703 ms 212630 loops/ms 2 thrds:t1 1e+009 loops 4734 ms 211238 loops/ms 2 thrds:t2 1e+009 loops 4734 ms 211238 loops/ms 2 thrds:tt 2e+009 loops 4734 ms 422476 loops/ms manythrd tn 1e+009 loops 7094 ms 140964 loops/ms ... manythrd tt 6e+009 loops 7094 ms 845785 loops/ms
Unfortunately the display showed some counters as if they were uninitialised !
I could solve the issue by storing the end value of each counter in an atomic<>
for later display. However I do not understand why the version based on simple shared memory does not work properly: each thread uses its own counter, so there is no racing condition. Even the display thread accesses the counters only after the counting threads are finished. Using volatile
did not help either.
Could anyone explain me this strange behaviour (as if memory was not updated) and tell me if I missed something ?
Here the shared variables:
const int maxthread = 6;
atomic<bool> other_finished = false;
atomic<long> acounter[maxthread];
Here the code of the threaded function:
void foo(long& count, int ic, long maxcount)
{
count = 0;
while (count < maxcount) {
count++;
if (count % 10000000 == 0)
this_thread::sleep_for(chrono::microseconds(1));
}
other_finished = true; // atomic: announce work is finished
acounter[ic] = count; // atomic: share result
}
Here an example of how I call benchmark the threads:
mytimer.on(); // second run, two threadeds
thread t1(foo, counter[0], 0, maxcount); // additional thread
foo(counter[1], 1, maxcount); // main thread
t1.join(); // wait end of additional thread
perf = mytimer.off();
display_perf("2 thrds:t1", counter[0], perf); // non atomic version of code
display_perf("2 thrds:t2", counter[1], perf);
display_perf("2 thrds:tt", counter[0] + counter[1], perf);