Each MPI task calls one cuda function that is scheduled on whatever GPU you choose. You can choose the GPU you want using the function cudaSetDevice()
. In your case, since each node contains 2 GPUs you can switch between every GPU with cudaSetDevice(0)
and cudaSetDevice(1)
. If you don't specify the GPU using the SetDevice function and combining it with the MPI task rank
, I believe the 2 MPI tasks will run both cuda functions on the same default GPU (numbered as 0) serially. Furthermore, if you run 3 or more mpi tasks on each node, you will have a race condition for sure, since 2 or more cuda functions will run on the same GPU serially.