7

I'm working with OpenMP to parallelize a scalar nested for loop:

double P[N][N];
double x=0.0,y=0.0;

for (int i=0; i<N; i++)
{
    for (int j=0; j<N; j++)
    {
        P[i][j]=someLongFunction(x,y);
        y+=1;
    }
    x+=1;
}

In this loop the important thing is that matrix P must be the same in both scalar and parallel versions:

All my possible trials didn't succeed...

linello
  • 8,451
  • 18
  • 63
  • 109

1 Answers1

13

The problem here is that you have added iteration-to-iteration dependencies with:

x+=1;
y+=1;

Therefore, as the code stands right now, it is not parallelizable. Attempting to do so will result in incorrect results. (as you are probably seeing)

Fortunately, in your case, you can directly compute them without introducing this dependency:

for (int i=0; i<N; i++)
{
    for (int j=0; j<N; j++)
    {
        P[i][j]=someLongFunction((double)i, (double)N*i + j);
    }
}

Now you can try throwing an OpenMP pragma over this and see if it works:

#pragma omp parallel for
for (int i=0; i<N; i++)
{
    for (int j=0; j<N; j++)
    {
        P[i][j]=someLongFunction((double)i, (double)N*i + j);
    }
}
Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • Ok thanks for the answers. Can I ask you another question? What if I want to reset every time y=0 before the inner loop? How would the openmp implementation change? – linello Dec 01 '11 at 10:05
  • Then change `(double)N*i + j` to `(double)j`. The key here is that I derived the expressions for `x` and `y` as a function of the loop indices. This lets you break the dependencies. – Mysticial Dec 01 '11 at 10:08
  • Many thanks to your answers, they clarified me how to disentangle loops to prepare them for OpenMP parallelization. A last question, why the serial and parallel versions of this code give me a very small portion of different elements? `for (int i=0; i – linello Dec 01 '11 at 10:52
  • It shouldn't be different, unless you have a bug somewhere. You might want to post that as a separate question. Two other things: 1) I'm not too familiar with the `ordered` directive, but I think the way you're using it is potentially self-defeating. 2) If possible, parallelize the outer loop instead of the inner loop. – Mysticial Dec 01 '11 at 11:03