When compiling the following code
#include <iostream>
#include <vector>
#include <thread>
#include <chrono>
#include <mutex>
std::mutex cout_mut;
void task()
{
for(int i=0; i<10; i++)
{
double d=0.0;
for(size_t cnt=0; cnt<200000000; cnt++) d += 1.23456;
std::lock_guard<std::mutex> lg(cout_mut);
std::cout << d << "(Help)" << std::endl;
// std::cout << "(Help)" << d << std::endl;
}
}
int main()
{
std::vector<std::thread> all_t(std::thread::hardware_concurrency());
auto t_begin = std::chrono::high_resolution_clock::now();
for(auto& t : all_t) t = std::thread{task};
for(auto& t : all_t) t.join();
auto t_end = std::chrono::high_resolution_clock::now();
std::cout << "Took : " << (t_end - t_begin).count() << std::endl;
}
Under MinGW 4.8.1 it takes roughly 2.5 seconds to execute on my box. That is approximately the time it takes to only execute the task
function single-threadedly.
However, when I uncomment the line in the middle and therefore comment out the line before (that is, when I exchange the order in which d
and "(Help)"
are written to std::cout
) the whole thing takes now 8-9 seconds.
What is the explanation?
I tested again and found out that I only have the problem with MinGW-build x32-4.8.1-win32-dwarf-rev3
but not with MinGW build x64-4.8.1-posix-seh-rev3
. I have a 64-bit machine. With the 64-bit compiler both versions take three seconds. However, using the 32-bit compiler, the problem remains (and is not due to release/debug version confusion).