I am confused about the data sharing scope of the variable acc in the flowing two cases. In the case 1 I get following compilation error: error: reduction variable ‘acc’ is private in outer context
, whereas the case 2 compiles without any issues.
According to this article variables defined outside parallel region are shared.
Why is adding for-loop parallelism privatizing acc? How can I in this case accumulate the result calculated in the the for-loop and distribute a loop's iteration space across a thread team?
case 1
float acc = 0.0f;
#pragma omp for simd reduction(+: acc)
for (int k = 0; k < MATRIX_SIZE; k++) {
float mul = alpha;
mul *= a[i * MATRIX_SIZE + k];
mul *= b[j * MATRIX_SIZE + k];
acc += mul;
}
case 2
float acc = 0.0f;
#pragma omp simd reduction(+: acc)
for (int k = 0; k < MATRIX_SIZE; k++) {
float mul = alpha;
mul *= a[i * MATRIX_SIZE + k];
mul *= b[j * MATRIX_SIZE + k];
acc += mul;
}