0

I have a old code written in FORTRAN and i need to accelerate it using OpenACC but when i try using directives, it says there is a dependance of un,vn,pn which prevents parallelism. Is it possible to parallelize this loop? I am new to OpenACC but have parallelized with OpenMP

    !$acc parallel loop
    do 9000 j=2,jmaxm
    jm=j-1 
    jp=j+1

    do 9001 i=2,imaxm
    im=i-1
    ip=i+1

    if(rmask(i,j).eq.1.0) then


   ! Calculate un field

un(i,j,kp)=un(i,j,km)+ tdt*rmask(i,j)*(
 +    txsav(i,j)*zn(nmm)/xpsi2(nmm)+ visch*zetun(i,j)
 +   -recdx*(pn(ip,j,k)-pn(i,j,k))-a*un(i,j,km)/cn(nmm)**2
 +   +0.25* fu(i,j)*(vn(i,j,k)+vn(ip,j,k)+vn(i,jm,k)+
 +    vn(ip,jm,k))
 +   -damp(i,j)*un(i,j,km)
 +   )

 c SBnd damper is not used
 cc     +   -(1./timkwd)*dampu(i,j)*un(i,j,km)

 ! Calculate vn field

    vn(i,j,kp)=vn(i,j,km)+ tdt*rmask(i,j)*(
 +    tysav(i,j)*zn(nmm)/xpsi2(nmm)+visch*zetvn(i,j)
 +   -recdy*(pn(i,jp,k)-pn(i,j,k))-a*vn(i,j,km)/cn(nmm)**2
 +   -0.25*fv(i,j)*(un(im,jp,k)+un(i,jp,k)+un(im,j,k)+
 +    un(i,j,k))
 +   )

 c EBnd damper is not used
 cc     +   -(1./timkwd)*dampv(i,j)*vn(i,j,km)

 ! Calculate pn field

    pn(i,j,kp)=pn(i,j,km)+tdt*rmask(i,j)*(
 +    cn(nmm)**2*(
 +   -recdx*(un(i,j,k)-un(im,j,k))
 +   -recdy*(vn(i,j,k)-vn(i,jm,k)))
 +   -a*pn(i,j,k)/cn(nmm)**2
 +   -dampu(i,j)*cn(nmm)/dx*pn(i,j,km)
 +   -dampv(i,j)*cn(nmm)/dx*pn(i,j,km)
 +   -damp(i,j)*pn(i,j,km)
 +   )

    rhon(i,j)=-pn(i,j,kp)/g
    wn(i,j)=
 +   -recdx*(un(i,j,kp)-un(im,j,kp))
 +   -recdy*(vn(i,j,kp)-vn(i,jm,kp))

    endif
 9001    continue
 9000    continue
 !$acc end parallel loop
Jovi DSilva
  • 216
  • 3
  • 14
  • 1
    What is the question? What it says exactly? – Vladimir F Героям слава Jan 13 '14 at 09:23
  • It seems that your loop has some complex dependencies across iterations. You can try to force the compiler to generate the kernel an ignore the dependency detection by using the **independent** keyword, but that would probably produce a sequential kernel. What compiler are you using and where have you written the pragmas? – Ruyk Jan 13 '14 at 10:39
  • I need to know if its possible to run the loop on the GPU with openacc – Jovi DSilva Jan 13 '14 at 15:06
  • @Ruyk I have added the pragmas in edited code. I have tried both !$acc sections and the !$acc parallel loop pragmas, both workout the same in my case. I am using PGI Fortran Workstation 2013. – Jovi DSilva Jan 13 '14 at 15:12
  • 1
    I think it's probably possible to parallelize this, but you haven't shown enough of what is going on. I'm guessing there is a loop somewhere in `k` that you haven't shown. Anyway, you've got loop carried dependencies, for example the `un` calculation depends on other neighboring `un` quantities (e.g. `un(i,j,km)`) and the compiler will choke on this. One approach would be to have two `un` arrays, e.g. `un` and `unn`, and ping-pong your calculations between them, i.e. `unn(i,j,kp)=un(i,j,km+ tdt...` etc. Also, I would normally try `!$acc kernels` first before trying others. – Robert Crovella Jan 15 '14 at 15:19
  • I will try that, there is only one loop before this that calculates zetvn(i,j) and zetun(i,j) loop is same as this one as far as start and end go. I will try what you said, in the meanwhile i will attach what i did to try and solve the problem , i did it but there is not gain in performance – Jovi DSilva Jan 28 '14 at 04:50

1 Answers1

0

You have data dependency and that means your algorithm is inherently sequential,

a simple example would be the difference between the Gauss-Seidel and Jacobi iterations, and why people use Jacobi in GPU's and not Gaus Seidel,

JimBamFeng
  • 709
  • 1
  • 4
  • 20