-1

The following code shows my implementation of list ranking algorithm with OpenMP. When I execute this code without the pragmas I get the correct results, but when I include the pragmas, I get errors (occasionally) in the output. The outputs are shown at the bottom. You can see that the second time the output was wrong. This occurs randomly. When I remove the pragmas my output is always correct. Is there an error in the way I used the pragmas or is there a dependency I am missing. (The sequential output is the expected output. When the parallel output matches sequential output the program prints DATA OK)

length and number of threads are 16.

#define NSIZE 1
#define NMAX 16
int Ns[NSIZE] = {16};
int A[NMAX] = {14,13,5,16,11,10,9,12,0,8,7,15,4,3,2,1}; 
int B[NMAX + 1] = {0};

int S[NMAX + 1] = {0};

int Rp[NMAX + 1] = {0};
int next[NMAX+1] = {0};

for(int i = 1, j=0; i <= n; i++, j++)
{
    B[i] = A[j];

}

int chunk = ceil(length/nthreads);
int i, j;
int tid;
//#pragma omp parallel num_threads(nthreads)
//{
//#pragma omp for schedule(dynamic, chunk) private(i)
for(i = 1; i <= length; i++)
{

    Rp[i] = 1;
    next[i] = S[i];
}


for(i = 1; i<=log2(length); i++)
{
#pragma omp parallel num_threads(nthreads) shared(Rp,next,chunk) private(j)
{
    #pragma omp for schedule(dynamic,chunk)
    for(j = 1; j <= length; j++)
    {
        if(next[j]!=0)
        {
            Rp[j] = Rp[j] + Rp[next[j]];

            next[j] = next[next[j]];
        }
    }

}
}

OUTPUT:

./a.out -- (This was the output when I ran the program the first time)

from parallel
data OK
Input: 14 13 5 16 11 10 9 12 0 8 7 15 4 3 2 1
Sequential: 6 10 4 8 3 15 1 13 0 14 2 12 9 5 11 7
Parallel : 6 10 4 8 3 15 1 13 0 14 2 12 9 5 11 7

./a.out -- (output when I ran the program the second time) from parallel
data MISMATCH!!!
Input: 14 13 5 16 11 10 9 12 0 8 7 15 4 3 2 1
Sequential: 6 10 4 8 3 15 1 13 0 14 2 12 9 5 11 7
Parallel : 6 10 4 8 3 15 1 13 0 10 2 12 9 5 11 7

./a.out -- (output when I ran the program the third time)

from parallel
data OK
Input: 14 13 5 16 11 10 9 12 0 8 7 15 4 3 2 1
Sequential: 6 10 4 8 3 15 1 13 0 14 2 12 9 5 11 7
Parallel : 6 10 4 8 3 15 1 13 0 14 2 12 9 5 11 7

kumar
  • 9
  • 3
  • 1
    Please [edit] your question to include an actual [mcve]. Where are `Rp`, `next` and `S` defined? What are the random errors in output, are they actual errors or unexpected output? (Please include actual output and expected output) – Jonny Henly Aug 23 '16 at 21:32
  • You have a race condition here: `next[j] = next[next[j]];`. Indeed, this depends on the order upon which you travel your `j` loop, so trying to parallelise it will just change the order and thereafter possibly change the result. – Gilles Aug 24 '16 at 08:29
  • @Gilles Thanks for the reply. I eliminated the race condition by putting the if loop in a critical region. But this means that I am effectively now running the algorithm in a sequential fashion. Is there any other way I can eliminate the race (I don't want to change the algorithm). – kumar Aug 24 '16 at 09:04
  • This is more than just a race condition actually. The trouble is that this algorithm is fundamentally not parallelisable, because the order of execution or the `j` loop matters here. So you could get away with using the `ordered` directive, but ultimately you would end-up with a sequential algorithm. Therefore, if you want to get it truly parallel, you have to fundamentally change your algorithm. – Gilles Aug 24 '16 at 09:23

1 Answers1

0

I think this should be possible with a second set of arrays. Correct me if I am wrong.

When you take two arrays and two pointers next_new and next_old and two arrays and two pointers Rp_new and Rp_old, it should do the trick.

You write to the new arrays, while reading from the old and swapping pointers after each Round.

Rp_new[j] = Rp_old[j] + Rp_old[next_old[j]];

next_new[j] = next_old[next_old[j]];
Kalpesh Dusane
  • 1,477
  • 3
  • 20
  • 27