So I need to measure execution time of some code inside for loop. Originally, I needed to measure several different activities, so I wrote a timer class to help me with that. After that I tried to speed things up by paralleling the for loop using OpenMP. The problem is that when running my code in parallel my time measurements become really different - the values increase approximately up to a factor of 10. So to avoid the possibility of flaw inside the timer class I started to measure execution time of the whole loop iteration, so structurally my code looks something like this:
#pragma omp parallel for num_threads(20)
for(size_t j = 0; j < entries.size(); ++j)
{
auto t1 = std::chrono::steady_clock::now();
// do stuff
auto t2 = std::chrono::steady_clock::now();
std::cout << "Execution time is "
<< std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1).count()
<< std::endl;
}
Here are some examples of difference between measurements in parallel and measurements in single thread:
Single-threaded | Multi-threaded |
---|---|
11.363545868021 | 94.154685442 |
4.8963048650184 | 16.618173163 |
4.939025568 | 25.4751074 |
18.447368772 | 110.709813843 |
Even though it is only a couple of examples, this behaviour seems to prevail in all loop iterations. I also tried to use boost's chrono library and thread_clock
but got the same result. Do I misunderstand something? What may be the cause of this? Maybe I get cumulative time of all threads?
Inside the for loop, during each iteration I read a different file. Based on this file I create and solve multitude of mixed-integer optimisation models. I solve them with MIP solver, which I set to run in one thread. The instance of the solver is created on each iteration. The only variables that is shared between iteration are constant strings which represents paths to some directories.
My machine has 32 threads (16 cores, 2 threads per core).
Also here are the timings of the whole application in single-threaded mode:
real 23m17.763s
user 21m46.284s
sys 1m28.187s
and in multi-threaded mode:
real 12m47.657s
user 156m20.479s
sys 2m34.311s