OpenMP parallel for slow down my code (C language)

Question

I'm trying to use openMP to speed up a the parallel version of list ranking. My implementation is as follows:

int ListRankingParallel(int *R1,int *S, int N)
{
int i;
int *Q = (int*)malloc(N * sizeof(int));

#pragma omp parallel for private(i)
for (i=0; i<N; i++){

    if( S[i] != -1)R1[i] = 1;
    else R1[i] = 0;
    Q[i] = S[i];

}

#pragma omp parallel for private(i)
for(i=0; i<N; i++)
    while (Q[i] != -1 & Q[Q[i]] != -1) {
        R1[i] = R1[i] + R1[Q[i]];
        Q[i] = Q[Q[i]];
    }

free(Q);

return *R1;
}

The serial version of my list ranking is

int ListRankingSerial(int *R2,int *S, int N)
{
int temp;  
int j,i;
for( i=0; i<N; i++){
    j = 0;
    temp = S[i];
    while(S[i]!=-1)
    {
        j++;
        S[i] = S[S[i]];
    }
    R2[i] = j;
    S[i] = temp;
}

return *R2;
}

When I run them repectively, using

get_walltime(&S1);
ListRankingParallel(R1,S,N);
get_walltime(&E1);

get_walltime(&S3);
ListRankingSerial(R3,S,N);
get_walltime(&E3);

If I run my code on my Mac, the parallel version runs significantly faster than the serial version. However, if I run it on another linux cluster, the parallel version is twice slower than the serial version.

On my mac, I compile my code using

gcc-7 -fopenmp <file name>.c

On the cluster, using

gcc -fopenmp <file name>.c

If you want to test my code, please use:

int main(){

int N = 1e+5;
int *S = (int*)malloc(N * sizeof(int));
int *R1 = (int*)malloc(N * sizeof(int));
int *R3 = (int*)malloc(N * sizeof(int));
double S1,S2,S3,E1,E2,E3;
int i;

for( i = 0; i < N; i++)
    S[i] = i+1;

S[N-1] = -1;

get_walltime(&S1);
ListRankingParallel(R1,S,N);
get_walltime(&E1);
printf("%f\n",E1-S1);

get_walltime(&S3);
ListRankingSerial(R3,S,N);
get_walltime(&E3);
printf("%f\n",E3-S3);

}

Can anyone please give me some advice? Thank you!

You're creating race conditions by accessing the same variable in an unsynchronized way. Try inserting `#pragma omp atomic` the line before you update an array. — ack, Mar 21 '18 at 14:08
@AlexQuilliam I still don't quite understand how to deal with the array race condition. Could you please be more specific to the modification of my code? Thanks! — noobie2023, Mar 21 '18 at 14:13
List ranking is a problem that is notoriously hard to parallelize. Don't expect simple OpenMP code to do it. You really would need a strategy that avoids concurrent writes to the same array element. Doing it with atomic would be one way, but that is costly, so you wouldn't see any performance gain. There is a lot of literature out there about LR, read it. Also, instead of `private(i)` you should nowadays just declare `i` inside the `for` loop. — Jens Gustedt, Mar 21 '18 at 14:47
BTW, it's better to `#include ` than to [to cast the return value of `malloc()` and family](/q/605845) in C. — Toby Speight, Mar 21 '18 at 15:11
Also aside from the race condition your parallel code has, never ever ever do performance measurements on a code that was compiled without optimization switches. Try using at the very least `-O3` and better `-O3 -march=native -mtune=native` — Gilles, Mar 21 '18 at 18:17
Thank you guys! I think I just found the problem. Since my parallel version code runs faster on my personal mac, I don't think race condition is a huge obstacle here. The real problem is when I run it on the cluster, I did not specify threads number in my code....(I thought by default I should be assigned some threads but it only assigns one to me given no specification in the code) Thanks again for your comments! I'm writting this for any possible future reference! — noobie2023, Mar 21 '18 at 21:52

score 0 · Accepted Answer · answered Mar 22 '18 at 23:49

Are you certain it is running on multiple threads?

You should either be setting the OMP_NUM_THREADS environment variable or calling omp_set_num_threads() at the start of main. You can get the total number of threads available using omp_get_max_threads() and do something like

max_threads = omp_get_max_threads()
omp_set_num_threads(max_threads)

See more information about setting the number of threads in this answer.

Edit: Also you can check how many threads are being used with omp_get_num_threads().

Thank you! Although I've mentioned this problem in my comment. But you're right! I solved my problem specifying number of threads! — noobie2023, Mar 23 '18 at 12:19

OpenMP parallel for slow down my code (C language)

1 Answers1