We are trying to provide a generic nbody algorithm for multiple Nodes. A node has 2 GPUs and 1 CPU.
We want to calculate the n-body only on GPUs using openacc. After doing some research about openacc i am unsure how to spread the calculation to multiple GPUs.
Is it possible to use 2 GPUs with only one thread and openacc? If not, what would be a suitable approch, using openMP to use both GPUs on one node and communicate with other nodes via MPI?