1

Consider a parallel loop, where each thread will be computing on a private vector dudz(izfirst:izlast). In my implementation, I want to accomplish two things:

  • Not allocate memory when this parallel region is entered (it is called every time step)
  • Avoid false sharing (I am currently rewriting the code to avoid excess cache misses)

To avoid problem 1, I was thinking of creating the array dudz(izfirst:izlast,nproc) where each thread only accesses dudz(:, omp_id), but isn't this vulnerable to false sharing? To avoid false sharing, I was thinking about using private(dudz), but doesn't this allocate memory?

The following code can be adapted to either of my solutions, but which one is better? Is there a third alternative that handles both my concerns?

!$omp parallel do num_threads(nproc) private(ix, iz, ishift)
do ix = ixfirst, ixlast

    do iz = izfirst, izfirst+ophalf-1
        dudz(iz) = 0.0
    enddo
    !$omp simd        
    do iz = izfirst+ophalf, izlast-ophalf+1

        dudz(iz) = az(1)*( u(iz,ix) - u(iz-1,ix) )
        do ishift = 2, ophalf
            dudz(iz) = dudz(iz) + az(ishift)*( u(iz+ishift-1,ix) - u(iz-ishift,ix) )
        enddo

        dudz(iz) = dudz(iz)*buoy_z(iz,ix)

    enddo
    !$omp end simd
    do iz = izlast-ophalf+2, izlast
        dudz(iz) = 0.0
    enddo

enddo
!$omp end parallel do

Thank you for any advice.

NoseKnowsAll
  • 4,593
  • 2
  • 23
  • 44
  • 1
    As long as `dudz` first dimension is larger than a cache line, there's no real danger of false sharing – Gilles Dec 17 '15 at 18:36
  • @Gilles I don't really know how it works. When a piece of data is being read in for the first time, does the system automatically bring in a full cache line? Wouldn't that extend to the end of the domain as well? So if `omp_id=0` brings in data from 0-9, then 10-19 but the domain ends at 15, while another thread contains the data from "16-25" (aka 0-9 on `omp_id=1`) won't that result in false sharing? – NoseKnowsAll Dec 17 '15 at 18:44
  • 1
    It can but 1/ the larger the array the less likely for it to happen, and 2/ the threads progressing concurently, they keep on avoiding to step on each-other toes – Gilles Dec 17 '15 at 18:49
  • Genius! I forgot to account for the concurrency almost always ensuring that two threads won't be accessing the same data. I will go with my initial `dudz(izfirst:izlast,nproc)` idea. Thanks, @Gilles – NoseKnowsAll Dec 17 '15 at 18:54
  • Finally, if you want to make 100% sure not to have false sharing, just pad your `dudz` first dimension to a multiple of the cache line's length. Drawback being that you have to enquire for this length... – Gilles Dec 17 '15 at 19:26
  • @Gilles : When you set your array to a multiple of the cache line's length, if your array is large and if your access pattern is very regular then you can fall into the problem of 4k aliasing (same address in the cache but different physical address). It's probably better to set it to the 1st prime number after the boundary of the cache line. – Anthony Scemama Dec 18 '15 at 01:40

0 Answers0