I understand that in symmetric multiprocessor (SMP) systems, false sharing may occur due to the individual caches in each cores, for the following code: http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads
01 double sum=0.0, sum_local[NUM_THREADS];
02 #pragma omp parallel num_threads(NUM_THREADS)
03 {
04 int me = omp_get_thread_num();
05 sum_local[me] = 0.0;
06
07 #pragma omp for
08 for (i = 0; i < N; i++)
09 sum_local[me] += x[i] * y[i];
10
11 #pragma omp atomic
12 sum += sum_local[me];
13 }
So my questions are:
- False sharing mainly comes from fact that memory are accessed via a chunk consisting of a fixed number of bytes, and that each process has its own cache. This chunk of memory are written and read together. Is my understanding close to the fact?
- What about the memory access pattern in GPU? Do all the stream processors share one set of cache or have separate ones? Does false sharing also a concern in GPU computing?