I've done some searching but couldn't find anything that appeared to be related to my question (sorry if my question is redundant!). Anyway, as the title states, I'm having trouble getting any improvement over the serial implementation of my code. The code snippet that I need to parallelize is as follows (this is Fortran90 with OpenMP):
do n=1,lm
do m=1,jm
do l=1,im
sum_u = 0
sum_v = 0
sum_t = 0
do k=1,lm
!$omp parallel do reduction (+:sum_u,sum_v,sum_t)
do j=1,jm
do i=1,im
exp_smoother=exp(-(abs(i-l)/hzscl)-(abs(j-m)/hzscl)-(abs(k-n)/vscl))
sum_u = sum_u + u_p(i,j,k) * exp_smoother
sum_v = sum_v + v_p(i,j,k) * exp_smoother
sum_t = sum_t + t_p(i,j,k) * exp_smoother
sum_u_pert(l,m,n) = sum_u
sum_v_pert(l,m,n) = sum_v
sum_t_pert(l,m,n) = sum_t
end do
end do
end do
end do
end do
end do
Am I running into race condition issues? Or am I simply putting the directive in the wrong place? I'm pretty new to this, so I apologize if this is an overly simplistic problem.
Anyway, without parallelization, the code is excruciatingly slow. To give an idea of the size of the problem, the lm, jm, and im indexes are 60, 401, and 501 respectively. So the parallelization is critical. Any help or links to helpful resources would be very much appreciated! I'm using xlf to compile the above code, if that's at all useful.
Thanks! -Jen