Trouble parallelizing OpenACC loop

Question

I have a old code written in FORTRAN and i need to accelerate it using OpenACC but when i try using directives, it says there is a dependance of un,vn,pn which prevents parallelism. Is it possible to parallelize this loop? I am new to OpenACC but have parallelized with OpenMP

    !$acc parallel loop
    do 9000 j=2,jmaxm
    jm=j-1 
    jp=j+1

    do 9001 i=2,imaxm
    im=i-1
    ip=i+1

    if(rmask(i,j).eq.1.0) then


   ! Calculate un field

un(i,j,kp)=un(i,j,km)+ tdt*rmask(i,j)*(
 +    txsav(i,j)*zn(nmm)/xpsi2(nmm)+ visch*zetun(i,j)
 +   -recdx*(pn(ip,j,k)-pn(i,j,k))-a*un(i,j,km)/cn(nmm)**2
 +   +0.25* fu(i,j)*(vn(i,j,k)+vn(ip,j,k)+vn(i,jm,k)+
 +    vn(ip,jm,k))
 +   -damp(i,j)*un(i,j,km)
 +   )

 c SBnd damper is not used
 cc     +   -(1./timkwd)*dampu(i,j)*un(i,j,km)

 ! Calculate vn field

    vn(i,j,kp)=vn(i,j,km)+ tdt*rmask(i,j)*(
 +    tysav(i,j)*zn(nmm)/xpsi2(nmm)+visch*zetvn(i,j)
 +   -recdy*(pn(i,jp,k)-pn(i,j,k))-a*vn(i,j,km)/cn(nmm)**2
 +   -0.25*fv(i,j)*(un(im,jp,k)+un(i,jp,k)+un(im,j,k)+
 +    un(i,j,k))
 +   )

 c EBnd damper is not used
 cc     +   -(1./timkwd)*dampv(i,j)*vn(i,j,km)

 ! Calculate pn field

    pn(i,j,kp)=pn(i,j,km)+tdt*rmask(i,j)*(
 +    cn(nmm)**2*(
 +   -recdx*(un(i,j,k)-un(im,j,k))
 +   -recdy*(vn(i,j,k)-vn(i,jm,k)))
 +   -a*pn(i,j,k)/cn(nmm)**2
 +   -dampu(i,j)*cn(nmm)/dx*pn(i,j,km)
 +   -dampv(i,j)*cn(nmm)/dx*pn(i,j,km)
 +   -damp(i,j)*pn(i,j,km)
 +   )

    rhon(i,j)=-pn(i,j,kp)/g
    wn(i,j)=
 +   -recdx*(un(i,j,kp)-un(im,j,kp))
 +   -recdy*(vn(i,j,kp)-vn(i,jm,kp))

    endif
 9001    continue
 9000    continue
 !$acc end parallel loop

It seems that your loop has some complex dependencies across iterations. You can try to force the compiler to generate the kernel an ignore the dependency detection by using the **independent** keyword, but that would probably produce a sequential kernel. What compiler are you using and where have you written the pragmas? — Ruyk, Jan 13 '14 at 10:39
I need to know if its possible to run the loop on the GPU with openacc — Jovi DSilva, Jan 13 '14 at 15:06
@Ruyk I have added the pragmas in edited code. I have tried both !$acc sections and the !$acc parallel loop pragmas, both workout the same in my case. I am using PGI Fortran Workstation 2013. — Jovi DSilva, Jan 13 '14 at 15:12
I think it's probably possible to parallelize this, but you haven't shown enough of what is going on. I'm guessing there is a loop somewhere in `k` that you haven't shown. Anyway, you've got loop carried dependencies, for example the `un` calculation depends on other neighboring `un` quantities (e.g. `un(i,j,km)`) and the compiler will choke on this. One approach would be to have two `un` arrays, e.g. `un` and `unn`, and ping-pong your calculations between them, i.e. `unn(i,j,kp)=un(i,j,km+ tdt...` etc. Also, I would normally try `!$acc kernels` first before trying others. — Robert Crovella, Jan 15 '14 at 15:19
I will try that, there is only one loop before this that calculates zetvn(i,j) and zetun(i,j) loop is same as this one as far as start and end go. I will try what you said, in the meanwhile i will attach what i did to try and solve the problem , i did it but there is not gain in performance — Jovi DSilva, Jan 28 '14 at 04:50

score 0 · Answer 1 · answered Mar 04 '18 at 03:47

0

You have data dependency and that means your algorithm is inherently sequential,

a simple example would be the difference between the Gauss-Seidel and Jacobi iterations, and why people use Jacobi in GPU's and not Gaus Seidel,

answered Mar 04 '18 at 03:47

JimBamFeng

709
1
4
20

You can have Red-Black Gauss-Seidel. – Vladimir F Героям слава Mar 05 '18 at 14:34
sure, I was just trying to point out the differrence, you can GMRES if you want too as well :-) , spasiba – JimBamFeng Mar 05 '18 at 22:50

Trouble parallelizing OpenACC loop

1 Answers1