I have a working serial code and a working parallel single GPU code parallelized via OpenACC. Now I am trying to increase the parallelism by running on multiple GPUs, employing mpi+openacc paradigm. I wrote my code in Fortran-90 and compile it using Nvidia's HPC-SDK's nvfortran compiler.
I have a few beginner level questions:
- How do I setup my compiler environment to start writing my mpi+openacc code. Are there any extra requirements other than Nvidia's HPC-SDK?
- Assuming I have a code written under mpi+openacc setup, how do I compile it exactly? Do I have to compile it two times? one for cpus (mpif90) and one for gpus (openacc). An example of a make file or some compilation commands will be helpful.
- When the communication between GPU-device-1 and GPU-device-2 is needed, is there a way to communicate directly between them, or I should be communicating via [GPU-device-1] ---> [CPU-host-1] ---> [CPU-host-2] ---> [GPU-device-2]
- Are there any sample Fortran codes with mpi+openacc implementation?