I have a small test program that compiles alongside my library for testing the speeds of various mathematics functions when using different methods (SSE, for-loop, unrolled-loop, ect). These test run over the different methods hundreds of thousands of times and work out the mean of computation time taken. I decided that I would create 4 worker threads for each core of my computer and run the benchmarks that way for my tests.
Now these are micro-benchmarks, measured in nano-seconds, so differences may seem large but there is no other kind of difference at that level really.
Here is my code for running over functions in a single-threaded fashion:
static constexpr std::size_t num_tests = 400000;
auto do_test = [=](uint64_t(*test)()){
// test is a function that returns nanosecods taken for a specific method
uint64_t accum = 0;
for(std::size_t n = 0; n < num_tests; n++)
accum += test();
return accum / num_tests;
};
and here is my (faster) code for running over tests in a multi-threaded fashion:
static constexpr std::size_t num_tests = 100000;
auto do_test = [=](uint64_t(*test)()){
uint64_t accum = 0;
std::thread first([&](){
for(std::size_t n = 0; n < num_tests; n++)
accum += test();
});
std::thread second([&](){
for(std::size_t n = 0; n < num_tests; n++)
accum += test();
});
std::thread third([&](){
for(std::size_t n = 0; n < num_tests; n++)
accum += test();
});
std::thread fourth([&](){
for(std::size_t n = 0; n < num_tests; n++)
accum += test();
});
first.join();
second.join();
third.join();
fourth.join();
return accum / (num_tests * 4);
};
BUT the results are slower D: so it executes faster, but the operations give slower results.
My single threaded version gives a mean of 77 nanoseconds whereas my multithreaded version gives a mean of 150 nanoseconds for the operations!
Why would this be?
P.S. I know it's a minuscule difference, I just thought it was interesting.