Getting started with OpenACC + MPI Fortran program

Question

I have a working serial code and a working parallel single GPU code parallelized via OpenACC. Now I am trying to increase the parallelism by running on multiple GPUs, employing mpi+openacc paradigm. I wrote my code in Fortran-90 and compile it using Nvidia's HPC-SDK's nvfortran compiler.

I have a few beginner level questions:

How do I setup my compiler environment to start writing my mpi+openacc code. Are there any extra requirements other than Nvidia's HPC-SDK?
Assuming I have a code written under mpi+openacc setup, how do I compile it exactly? Do I have to compile it two times? one for cpus (mpif90) and one for gpus (openacc). An example of a make file or some compilation commands will be helpful.
When the communication between GPU-device-1 and GPU-device-2 is needed, is there a way to communicate directly between them, or I should be communicating via [GPU-device-1] ---> [CPU-host-1] ---> [CPU-host-2] ---> [GPU-device-2]
Are there any sample Fortran codes with mpi+openacc implementation?

Welcome, I suggest taking the [tour] and reading [ask]. Please not that your question is really broad with several independent numbered points. You should normally ask one clear question. — Vladimir F Героям слава, Dec 02 '21 at 09:43

score 2 · Accepted Answer · answered Dec 02 '21 at 14:44

As @Vladimir F pointed out, your question is very broad, so if you have further questions about specific points you should consider posting each point individually. That said, I'll try to answer each.

If you install NVIDIA HPC SDK you should have everything you need. It'll include an installation of OpenMPI plus NVIDIA's HPC compilers for the OpenACC. You'll also have a variety of math libraries, if you need those too.
Compile using mpif90 for everything. For instance, mpif90 -acc=gpu will build the files with OpenACC to include GPU support and files that don't include OpenACC will compile normally. The MPI module should be found automatically during compilation and the MPI libraries will be linked in.
You can use the acc host_data use_device directive to pass the GPU version of your data to MPI. I don't have a Fortran example with MPI, but it looks similar to the call in this file. https://github.com/jefflarkin/openacc-interoperability/blob/master/openacc_cublas.f90#L19
This code uses both OpenACC and MPI, but doesn't use the host_data directive I referenced in 3. If I find another, I'll update this answer. It's a common pattern, but I don't have an open code handy at the moment. https://github.com/UK-MAC/CloverLeaf_OpenACC

Getting started with OpenACC + MPI Fortran program

1 Answers1

Linked