OpenMP outputs incorrect answers

Question

I have typed this simple code to calculate the number of prime numbers between 2 and 5,000,000. The algorithm works fine and it outputs the correct answer, however when I try to use OpenMP to speedup the execution it outputs a different answer every time.

#include "time.h"
#include "stdio.h"
#include "omp.h"
int main()
{
    clock_t start = clock();
    int count = 1;
    int x;
    bool flag;
    #pragma omp parallel for schedule(static,1) num_threads(2) shared(count) private(x,flag)
    for (x = 3; x <= 5000000; x+=2)
    {
        flag = false;
        if (x == 2 || x == 3)
            count++;
        else if (x % 2 == 0 || x % 3 == 0)
            continue;
        else
        {
            for (int i = 5; i * i <= x; i += 6)
            {
                if (x % i == 0 || x % (i + 2) == 0)
                {
                    flag = true;
                    break;
                }
            }
            if (!flag)
                count++;
        }
    }
    clock_t end = clock();
    printf("The execution took %f ms\n", (double)end - start / CLOCKS_PER_SEC);
    printf("%d\n", count);
}

The code doesn't work for any number of threads, dynamic or static scheduling or different chunk sizes. I have tried messing with private and shared variables but it still didn't work and declaring x and flag inside the for loop didn't work either. I am using Visual Studio 2019 and I have OpenMP support enabled. What's the problem with my code ?

score 2 · Accepted Answer · answered Nov 24 '20 at 08:46

You have race conditions with your count variable where multiple threads can try to update it at the same time. The easy fix is to use an OpenMP reduction() clause to give each thread a private copy of the variable and have them all get added up properly:

#include <time.h>
#include <stdio.h>
#include <stdbool.h>

int main(void)
{
    clock_t start = clock();
    int count = 1;
#pragma omp parallel for schedule(static,1) num_threads(2) reduction(+:count)
    for (int x = 3; x <= 5000000; x+=2)
    {
        bool flag = false;
        if (x == 2 || x == 3)
            count++;
        else if (x % 2 == 0 || x % 3 == 0)
            continue;
        else
        {
            for (int i = 5; i * i <= x; i += 6)
            {
                if (x % i == 0 || x % (i + 2) == 0)
                {
                    flag = true;
                    break;
                }
            }
            if (!flag)
                count++;
        }
    }
    clock_t end = clock();
    printf("The execution took %f ms\n", (double)end - start / CLOCKS_PER_SEC);
    printf("%d\n", count);
}

This outputs 348513 (Verified as the right number through other software).

Also note cleaned up headers and moving some variable declarations around to avoid the need for a private() clause.

You could also make count an atomic int, but that's slower than using reduction() in my testing.

dreamcrash · Answer 2 · 2020-12-05T08:10:19.113

Just to add to the answer provided by @Shawn, besides solving the count race condition using the reduction OpenMP clause. You can also analyze if your code has load balancing issues, looking at the iterations of the loop that you are parallelizing it is clear that not all iterations have the same among of work. Since you are assigning work to threads in a static manner you might have one thread doing much more work than the other. Test around with the dynamic schedule to see if you notice any difference.

Besides that, you can significantly simplify your sequential code by removing all those conditional branchings that negatively affect the performance of your parallel version.

First you do not need (x == 2), since int x = 3;. You do not need (x == 3) either, just remove it and make count=2; (instead of count=1;) and int x = 5;, since the loop is incrementing in steps of 2 (i.e., x+=2). With this you can also remove this:

if (x == 2 || x == 3)
    count++;

Now because the loop starts at 5, and has an incremental step of 2, you know that it will be iterating over odd numbers only, so we can remove also x % 2 == 0 . Now we only have an if( x % 3 == 0) continue; else{..}, which can be simplified into if(x % 3 != 0){..}.

You can rewrite the code also to remove that break:

#pragma omp parallel for schedule(static,1) num_threads(2) reduction(+:count)
for (int x = 5; x <= 5000000; x += 2) {
    boolean flag = false;
    if (x % 3 != 0) {
        for (i = 5; !flag && i * i <= x; i += 6) {
            flag = (x % i == 0 || x % (i + 2) == 0);
        }
        if (!flag) {
            count++;
        }
    }
}

because you are using C/C++ you can even remove that if as well:

    int count = 2;
    #pragma omp parallel for schedule(static,1) num_threads(2) reduction(+:count)
    for (int x = 5; x <= 5000000; x += 2) {
        if (x % 3 != 0) {
            int flag = 1;
            for (int i = 5; flag && i * i <= x; i += 6) {
                flag = x % i != 0 && x % (i + 2) != 0;
            }
            count += flag;
        }
    }
    printf("%d\n", count);

IMO the code is more readable now, we could further improve it by given a good name to the variable flag.

OpenMP outputs incorrect answers

2 Answers2