I am finding all primes using the Sieve of Eratosthenes algorithm. I attempted to parallelize this algorithm. However, speedup stops increasing after two threads!
My code is essentially two for loops. The outer for loop is the one increments the counter by an invariant (2). Within this loop, the counter is incremented by 2*x where x is unknown.
#pragma omp parallel for
for (int k = 3; k <= sqrt_size; k = k+2) {
for (int curPrime = k*k; curPrime <= size; curPrime += 2*k) marked[curPrime/2] = 1;
while (marked[k/2]) k += 2; // Find following unmarked value (unknown amount)
}
I think my lack of speedup past two threads is due to the unknown amount being added to the counter variable. Should I make the counter shared, and then place the increment or entire while loop within a critical section?