I am trying to run OpenMPI code on a NVIDIA Jetson TX2. But I am getting an OPAL Error when i run mpiexec
.
Compilation instruction:
$ nvcc -I/home/user/.openmpi/include/ -L/home/user/.openmpi/lib/ -lmpi -std=c++11 *.cu *.cpp -o program
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
Execution error message:
$ mpiexec -np 4 ./program
[user:05728] OPAL ERROR: Not initialized in file pmix2x_client.c at line 109
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[user:05728] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[user:05729] OPAL ERROR: Not initialized in file pmix2x_client.c at line 109
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[user:05729] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[7361,1],0]
Exit code: 1
--------------------------------------------------------------------------
I installed OpenMPI version 3.1.2 using the following instructions:
$ ./configure --prefix="/home/user/.openmpi" --with-cuda
$ make; sudo make install
I have also setup my $PATH
and my $LD_LIBRARY_PATH
variables accordingly based on instructions from this link
I am able to successfully execute the program on my laptop (Intel i7). Upon looking up the error I found some links suggesting that I reinstall OpenMPI. I have tried doing so multiple times (including a fresh download of the library) without any success.
Any help would be greatly appreciated!
Edits
I tried running the following Minimal code (main.cpp
) as asked in the comments:
#include <iostream>
#include "mpi.h"
#include <string>
int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
std::cout << rank << '\n';
MPI_Finalize();
return 0;
}
To compile this, I reran the previous command and got the same error:
$ nvcc -I/home/user/.openmpi/include/ -L/home/user/.openmpi/lib/ -lmpi -std=c++11 main.cpp -o program
But then if i compile it with mpic++
it is able to run perfectly fine.
$ mpic++ main.cpp -o ./program
$ mpiexec -np 4 ./program
0
1
3
2