0

I have this code that was working for years (and is still working when using some random compilers).

What we expect is to have the same result in sequential and in parallel execution.

The symptom is that at each execution, the parallel execution produces another result.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{

    int i, N, j, sum;
    int ** A;

    sum=0;
    N=1000;

    A=(int**)malloc(N*sizeof(int*));
    for (i=0;i<N;i++) {
        A[i] = (int*)malloc(N *sizeof(int));
    }

    for (i=0; i<N; ++i) {
        for (j=0; j<N; ++j) {
            A[i][j]=i+j;
            sum+=A[i][j];
        }
    }

    printf("Total sum = %d \n",sum);
    sum=0;


    #pragma omp parallel for reduction(+:sum)
    for (i=0; i<N; ++i) {
        for (j=0; j<N; ++j) {
            sum += A[i][j];
        }
    }

    printf("Total sum = %d \n",sum);

    for (i=0;i<N;i++){ free(A[i]);}
    free(A);

    return 0;
}

We compile it like that:

gcc -fopenmp reduction.c

And run it like that:

./a.out
Total sum = 999000000
Total sum = 822136991

It's working with icc.

Edit: if we use optimization -O3 with Gcc it's working also.

Yann Sagon
  • 547
  • 3
  • 21

2 Answers2

1

The problem is that you have a nested loop, and the pragma is only applying for the outer one. You need to use a collapse clause. You can read about it this question and in this site. The program works correctly if you replace your #pragma line with:

#pragma omp parallel for reduction(+:sum) collapse(2)
Community
  • 1
  • 1
jdehesa
  • 58,456
  • 7
  • 77
  • 121
1

Here are three ways to fix your code

Explicitly make j private

#pragma omp parallel for reduction(+:sum) private(j)
for (i=0; i<N; ++i) {
    for (j=0; j<N; ++j) {
        sum += A[i][j];
    }
}

Change your code and define i and j inside a parallel region

#pragma omp parallel reduction(+:sum)
{
    int i,j;
    #pragma omp for
    for (i=0; i<N; ++i) {
        for (j=0; j<N; ++j) {
            sum += A[i][j];
        }
    }
}

Use C99 (or GNU99) and change your code to

#pragma omp parallel for reduction(+:sum)
for (int i=0; i<N; ++i) {
    for (int j=0; j<N; ++j) {
        sum += A[i][j];
    }
}
Z boson
  • 32,619
  • 11
  • 123
  • 226