I've distilled the problem I have to its bare essentials. Here is the first example piece of code:
#include <vector>
#include <math.h>
#include <thread>
std::vector<double> vec(10000);
void run(void)
{
for(int l = 0; l < 500000; l++) {
#pragma omp parallel for
for(int idx = 0; idx < vec.size(); idx++) {
vec[idx] += cos(idx);
}
}
}
int main(void)
{
#pragma omp parallel
{
}
std::thread threaded_call(&run);
threaded_call.join();
return 0;
}
Compile this as (on Ubuntu 20.04): g++ -fopenmp main.cpp -o main
EDIT: Version: g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Running on a Ryzen 3700x (8 cores, 16 threads) : run time ~43s, all 16 logical cores reported in System Monitor at ~80%.
Next take out the #pragma omp parallel directive, so the main function becomes:
int main(void)
{
std::thread threaded_call(&run);
threaded_call.join();
return 0;
}
Now run time ~9s, all 16 logical cores reported in System Monitor at 100%.
I've also compiled this using MSVC on Windows 10, cpu utilization is always ~100% irrespective of the #pragma omp parallel directive being there or not. Yes I am fully aware this line should do absolutely nothing, yet with g++ it causes the above behaviour; also it only happens if calling the run function on a thread, not directly. I experimented with various compilation flags (-O levels) but problem remains. I suppose looking at the assembly code is the next step, but I can't see how this is anything but a bug in g++. Can anyone shed some light on this please? Would be much appreciated.
Furthermore, calling omp_set_num_threads(1); in the "void run(void)" function just before the loop, in order to check how long a single thread takes, gives ~70s run time with only one thread at 100% (as expected).
Further, possibly related problem (although this might be lack of understanding on my part): Calling omp_set_num_threads(1); in the "int main(void)" function (before threaded_call is defined) does nothing when compiling with g++, i.e. all 16 threads still execute in the for loop, irrespective of the bogus #pragma omp parallel directive. When compiling with MSVC this causes only one thread to run as expected - according to the documentation for omp_set_num_threads I though this should be the correct behaviour, but not so with g++. Why not, is this a further bug?
EDIT: this last problem I understand now (Overriding OMP_NUM_THREADS from code - for real), but still leaves the original problem outstanding.