0

I'm writing a program to count prime numbers from 2 to N using OpenMP. I wrote this code and while testing it, for example, with N = 10 (OMP_NUM_THREADS=2 ./main 10) I get mostly 4-s but sometimes I get 5-s in my output. It seems like there is a race condition that I cannot see and I don't now where it is. Can you help me to find this condition and get rid of it?

#include <iostream>
#include <omp.h>
#include <vector>

int main(int argc, char *argv[])
{
    if (argc != 2)
        return -1;

    const unsigned long N = std::atoi(argv[1]);

    unsigned counter = 0;
    std::vector<bool> numbers(N + 1, true);
    numbers[0] = numbers[1] = false;

#pragma omp parallel shared(numbers, counter)
    {
#pragma omp single
        {
            for (size_t i = 2; i * i <= N; ++i)
            {
                if (numbers[i] == true)
#pragma omp task
                {
                    for (size_t j = i * i; j <= N; j += i)
                        numbers[j] = false;
                }
            }
#pragma omp taskwait
        }
#pragma omp for reduction(+ \
                          : counter)
        for (size_t i = 1; i <= N; ++i)
            if (numbers[i] == true)
                counter++;
    }

    std::cout << counter << std::endl;

    return 0;
}

Edit: appreciate all the help in the comments. Calling sanitizer really helped to detect a data race condition. Adding "#pragma for critical" for the numbers[j]=false does solve the problem with multiple writings and dealing with race conditions.
But this works only for big numbers N, when we have multiple false writes in same position.
So I want to return previous question to particular case with N=10: here are no multiple writings in same region, take a look at cycles: (size_t i = 2; i * i <= N; ++i) will give us i = 2,3. So with for (size_t j = i * i; j <= N; j += i) we will have j = 4,6,8,10,9.
After this I took a look at the output "numbers" vector, and in case where my program returns "5" there is this array: 0 0 1 1 0 1 0 1 0 1 0 where bits represent numbers from 0 to 10 from left to right and prime numbers marked as "1"-s.
I think, it's impossible for the main "single" thread execute earlier than threads, called by "task" directive because I have #pragma omp taskwait. If that is true, then I don't know why I have such strange output. How do you think, what causes the problem here?

hiotareq
  • 1
  • 2
  • 1
    You could try "thread sanitizer" - it may be able to spot the problem. – Jesper Juhl Sep 18 '22 at 21:48
  • You seem to be implementing a sieve. But I don't understand the lowerbound on the `j` loop. Explain? Also, in a sieve each number will be scratched off multiple times, so your `numbers[j]=false` statement should be atomic, except that it would really surprise me if multiple `false` writes would give anything but `false`. Ok, so if you get 5 as output, which primes is it finding? – Victor Eijkhout Sep 18 '22 at 23:50
  • You should specify your platform and OpenMP library. I don't see any race condition. But your algorithm is incorrect because the tasks for the inner loop will run in parallel out of order. – relent95 Sep 19 '22 at 02:01
  • try building with gcc or clang with `-fsanitize=thread` option to check for any race condition. – ar2015 Sep 19 '22 at 04:28
  • I have compiled this code, and got inconsistent values with N=100 for various runs (2 threads). Adding a `critical` pragma for `numbers[j]=false` seems to solve the problem (but of course this cancels all benefits of multithreading). As @VictorEijkhout wrote this is somehow surprising: we do not expect concurrent writes of `false` to the same address give anything but `false` at the end, but still I think that this results in an unpredictable behavior. – PierU Sep 19 '22 at 09:59
  • Regarding your edit: I would try putting the `omp taskwait`outside of the `omp single` scope – PierU Sep 19 '22 at 11:49
  • Btw, use `atomic` rather than `critical`. Much faster. – Victor Eijkhout Sep 19 '22 at 13:37
  • @VictorEijkhout I tried on this code, and discovered that `atomic` could not be used on a `std:vector<>` (maybe there are ways, but my knowledge of C++ is quite basic). – PierU Sep 19 '22 at 13:40
  • @PierU I think the problem is that `atomic` wants to do a `+=` type op: read & write both. Try `numbers[i] = numbers[i] && false`? Otherwise for as far as I can tell it ought to work on vectors. – Victor Eijkhout Sep 19 '22 at 16:59
  • Generally `atomic write` can be used on `std::vector<>`, but it may not be used on `std::vector` if it is a space-efficient specialization. In this case use `std::vector`. Note that 1) there is also a race condition here: `if (numbers[i] == true)` , 2) false sharing may ruin your speed improvement. – Laci Sep 19 '22 at 17:09
  • Please check the atomic operations used in [this answer](https://stackoverflow.com/questions/69450314/openmp-invalid-controlling-predicate/69452857#69452857) . – Laci Sep 19 '22 at 17:17
  • @VictorEijkhout the problem is that `std::vector` is a space-efficient specialization, so multiple bit manipulations can cause unexpected result. – Laci Sep 19 '22 at 18:22
  • 1
    @Laci Oh crap. That thing. That's bitten me any number of times. – Victor Eijkhout Sep 19 '22 at 20:38
  • @Laci I have tested with `std::vector` instead of `std::vector` , and it indeed solves the problem (without the need of an `atomic` pragma, even if it would be safer in principle) – PierU Sep 20 '22 at 13:27

1 Answers1

0

Thanks to everyone who commented this question. Laci gave me the answer that I really needed to understand this condition in my code. It is that std::vector<bool> is a space-efficient specialization, so multiple bit manipulations can cause unexpected result. Solution is to change bool to char and you don't need any more directives so this code works like I expected.

hiotareq
  • 1
  • 2