I learnt OpenMP using Tim Matterson's lecture notes, and he gave an example of false sharing as below. The code is simple and is used to calculate pi from numerical integral of 4.0/(1+x*x) with x ranges from 0 to 1. The code uses a vector to contain the value of 4.0/(1+x*x) for each x from 0 to 1, then sum the vector at the end:
#include <omp.h>
static long num_steps = 100000;
double step;
#define NUM_THREADS 2
void main()
{
int i, nthreads; double pi, sum[NUM_THREADS];
step = 1.0/(double)num_steps;
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel
{
int i, id, nthrds;
double x;
id = omp_get_thread_num();
nthrds = omp_get_num_threads();
if (id == 0) nthreads = nthrds;
for (i=id, sum[id]=0.0; i<num_steps; i=i+nthrds){
x = (i+0.5)*step;
sum[id] += 4.0/(1.0+x*x);
}
}
for (i=0; pi=0.0; i<nthreads;i++) pi += sum[i]*step;
}
I have some questions about false sharing from this example:
- Is the false sharing caused by the fact that the job of writing to the array is divided intermittently between two threads, i.e. [thread0, thread1, thread0, thread1, ...]? If we use
#pragma omp parallel for
, then the array will be divided as [thread0, thread0, thread0, ...., thread1, thread1, thread1, ...], then do we still have false sharing, now that the address being accessed from each thread is far from each other? - If I have a job that uses
#pragma omp parallel for
to write to an output vector that has 1-to-1 correspondence with my input vector (for example the input is a matrix of predictors and the output is a vector is prediction), then when do I need to worry about false sharing?