I have a C++ project which uses OpenMP, and in some place in the code I have #pragma omp simd
nested inside #pragma omp parallel
. There was a consistent crash in the code which happened only in multi-threaded runs compiled in debug mode (and not in release). I made a short reproducible code which exemplifies the problem -
#include <iostream>
#include <atomic>
#include <omp.h>
struct A {
int z;
};
int main() {
size_t size = 100;
auto A_arr = new A*[size];
#pragma omp parallel
{
#pragma omp for schedule(dynamic)
for (size_t x = 0; x < size; ++x) {
A_arr[x] = new A{0};
}
}
#pragma omp parallel
{
A** begin = A_arr;
#pragma omp simd
for (size_t x = 0 ; x < size ; ++x) {
A* a = *begin;
auto z = a->z;
begin++;
}
}
delete[] A_arr;
return 0;
}
Compiling this with icpc
in debug mode runs just fine. But, if I change the SIMD loop to
#pragma omp simd
for (size_t x = 0 ; x < size ; ++x) {
A* a = begin[x];
auto z = a->z;
}
}
(which should be logically equivalent) the code suddenly crashes in debug mode compilation, and works fine in release mode.
I did a lot of debugging to change isolate the problematic part in the code, and I think the example I presented needs no further context.
I also tried using gdb
(in the crash it sometimes claims that a
is NULL
, and sometimes it points to a location in the memory which cannot be read from), and valgrind
(which ran successfully).
From searching online, I understand that that the SIMD vectorization doesn't happen in -O0
, but apparently the SIMD loop claims still makes the debugger to make some assumptions regarding the loop spanning, which may explain the different results in debug and release modes.=
Of course what I described here solves the problem, but I wish to understand better what happens here, and whether there's a "missing bug" which I just hid deeper.
Thanks in advance!