I am trying to learn how to perform inter-gpu data communication using the following toy code. The task of the program is to send array 'a' data in gpu-0 in to gpu-1's memory. I took the following root to do so, which involved four steps:
After initializing array 'a' on gpu0,
- step1: send data from gpu0 to cpu0 (using
!acc update self()
clause) - step2: send data from cpu0 to cpu1 (using
MPI_SEND()
) - step3: receive data into cpu1 from cpu0 (using
MPI_RECV()
) - step4: update gpu1 device memory (using
!$acc update device()
clause)
This works perfectly fine, but this looks like a very long route and I think there is a better way of doing this. I tried to read up on !$acc host_data use_device
clause suggested in the following post, but not able to implement it:
Getting started with OpenACC + MPI Fortran program
I would like to know how !$acc host_data use_device
can be used, to perform the task shown below in an efficient manner.
PROGRAM TOY_MPI_OpenACC
implicit none
include 'mpif.h'
integer :: rank, nprocs, ierr, i, dest_rank, tag, from
integer :: status(MPI_STATUS_SIZE)
integer, parameter :: N = 10000
double precision, dimension(N) :: a
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr)
print*, 'Process ', rank, ' of', nprocs, ' is alive'
!$acc data create(a)
! initialize 'a' on gpu0 (not cpu0)
IF (rank == 0) THEN
!$acc parallel loop default(present)
DO i = 1,N
a(i) = 1
ENDDO
ENDIF
! step1: send data from gpu0 to cpu0
!$acc update self(a)
print*, 'a in rank', rank, ' before communication is ', a(N/2)
IF (rank == 0) THEN
! step2: send from cpu0
dest_rank = 1; tag = 1999
call MPI_SEND(a, N, MPI_DOUBLE_PRECISION, dest_rank, tag, MPI_COMM_WORLD, ierr)
ELSEIF (rank == 1) THEN
! step3: recieve into cpu1
from = MPI_ANY_SOURCE; tag = MPI_ANY_TAG;
call MPI_RECV(a, N, MPI_DOUBLE_PRECISION, from, tag, MPI_COMM_WORLD, status, ierr)
! step4: send data in to gpu1 from cpu1
!$acc update device(a)
ENDIF
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
print*, 'a in rank', rank, ' after communication is ', a(N/2)
!$acc end data
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
END
compilation: mpif90 -acc -ta=tesla toycode.f90
(mpif90 from nvidia hpc-sdk 21.9)
execution : mpirun -np 2 ./a.out