False sharing in OpenMP when writing to a single vector

Question

I learnt OpenMP using Tim Matterson's lecture notes, and he gave an example of false sharing as below. The code is simple and is used to calculate pi from numerical integral of 4.0/(1+x*x) with x ranges from 0 to 1. The code uses a vector to contain the value of 4.0/(1+x*x) for each x from 0 to 1, then sum the vector at the end:

#include <omp.h>
static long num_steps = 100000;
double step;
#define NUM_THREADS 2
void main()
{
    int i, nthreads; double pi, sum[NUM_THREADS];
    step = 1.0/(double)num_steps;
    omp_set_num_threads(NUM_THREADS);
    #pragma omp parallel
    {
        int i, id, nthrds;
        double x;
        id = omp_get_thread_num();
        nthrds = omp_get_num_threads();
        if (id == 0) nthreads = nthrds;
        for (i=id, sum[id]=0.0; i<num_steps; i=i+nthrds){
            x = (i+0.5)*step;
            sum[id] += 4.0/(1.0+x*x);
        }
    }
    for (i=0; pi=0.0; i<nthreads;i++) pi += sum[i]*step;
}

I have some questions about false sharing from this example:

Is the false sharing caused by the fact that the job of writing to the array is divided intermittently between two threads, i.e. [thread0, thread1, thread0, thread1, ...]? If we use #pragma omp parallel for, then the array will be divided as [thread0, thread0, thread0, ...., thread1, thread1, thread1, ...], then do we still have false sharing, now that the address being accessed from each thread is far from each other?
If I have a job that uses #pragma omp parallel for to write to an output vector that has 1-to-1 correspondence with my input vector (for example the input is a matrix of predictors and the output is a vector is prediction), then when do I need to worry about false sharing?

score 1 · Answer 1 · answered Sep 11 '18 at 20:55

This tutorial keeps sending confused people on Stack Overflow - sometimes it's not a good idea to learn bottom up.

The array sum only has 2 === NUM_THREADS entries, i.e. [sum of thread 0, sum of thread 1]. Those values are likely on the same cache-line therefore causing false sharing.
If the input and output vectors are sufficiently (i.e. hundreds of elements per threads), you are fine. You should always use idiomatic OpenMP, i.e. using parallel for rather than the manual worksharing exhibited in the problematic examples of this tutorial. Then you are fine by default because OpenMP will assign adjacent indices to the same thread.

If you haven't got to the point in the tutorial, make sure to use the built-in reduction keyword rather than manually hacking together reduction as exposed in the example.

False sharing in OpenMP when writing to a single vector

1 Answers1

Linked

Related