I've been falling in love with the ease-of-use of Fortran's Coarrays framework, because of how clean it is compared to lower level APIs like MPI.
But one thing I haven't been able to tease out is whether there is a way to know how to explicitly tell Fortran to perform puts and gets asynchronously. The benefit of this would be to replicate MPI's MPI_I*
call, which allow overlapping communication and computation.
The reason why I'm interested in overlapping is for performance reasons. The particular application I have in mind is in CFD with particle methods, where the domain is subdivided and halo particles are exchanged every time-step. Using MPI p2p calls, which I'm currently more familiar with, I'm initiating the exchange of particle information between processes and then performing computation while the communications are completing, kind of like:
do pid = 0, numprocs-1
if (pid /= procid) then
! post sends
call MPI_ISEND(neighbours(pid+1)%sendbuff, &
neighbours(pid+1)%n_send, &
particle_derived_type, &
pid, &
0, &
MPI_COMM_WORLD, &
request(pid+1), &
ierr)
! post receives
call MPI_IRECV(neighbours(pid+1)%recvbuff, &
neighbours(pid+1)%n_recv, &
particle_derived_type, &
pid+1, &
0, &
MPI_COMM_WORLD, &
request(numprocs+pid+1), &
ierr)
end if
end do
! do some heavy computation
call MPI_WAITALL(2*numprocs, request, status, ierr)
This is just for demonstration. In reality, each process would only communicate information with its neighbour processes and not all of them. The advantage of using MPI_ISEND/RECV
here is that I don't have to worry about locking and that I can so some computation while the sends and receives are being completed.
A kind of equivalent example using Coarrays:
do pid = 1, numprocs
if (i /= this_image()) then
! put data into remote neighbour images
n_send = neighbours(pid)%n_send
neighbours(this_image())[pid]%recv_buff(1:n_send) = neighbours(pid)%send_buff(1:n_send)
end if
end do
! do some heavy computation
sync all
which is cool, because it's much more compact. But I'm not sure whether the "puts" return after initiating the transfer like with MPI_ISEND/RECV
.
So for this example, I'm interested in replicating MPI_I*
ability to overlap communications with computation in Fortran Coarrays, as it is pretty important in optimising the performance of CFD simulations.
EDIT Hopefully clearer explanation of why I want to overlap comms with comps.