Consider a parallel loop, where each thread will be computing on a private vector dudz(izfirst:izlast)
. In my implementation, I want to accomplish two things:
- Not allocate memory when this parallel region is entered (it is called every time step)
- Avoid false sharing (I am currently rewriting the code to avoid excess cache misses)
To avoid problem 1, I was thinking of creating the array dudz(izfirst:izlast,nproc)
where each thread only accesses dudz(:, omp_id)
, but isn't this vulnerable to false sharing? To avoid false sharing, I was thinking about using private(dudz)
, but doesn't this allocate memory?
The following code can be adapted to either of my solutions, but which one is better? Is there a third alternative that handles both my concerns?
!$omp parallel do num_threads(nproc) private(ix, iz, ishift)
do ix = ixfirst, ixlast
do iz = izfirst, izfirst+ophalf-1
dudz(iz) = 0.0
enddo
!$omp simd
do iz = izfirst+ophalf, izlast-ophalf+1
dudz(iz) = az(1)*( u(iz,ix) - u(iz-1,ix) )
do ishift = 2, ophalf
dudz(iz) = dudz(iz) + az(ishift)*( u(iz+ishift-1,ix) - u(iz-ishift,ix) )
enddo
dudz(iz) = dudz(iz)*buoy_z(iz,ix)
enddo
!$omp end simd
do iz = izlast-ophalf+2, izlast
dudz(iz) = 0.0
enddo
enddo
!$omp end parallel do
Thank you for any advice.