3

I am trying to run OpenMPI code on a NVIDIA Jetson TX2. But I am getting an OPAL Error when i run mpiexec.

Compilation instruction:

$ nvcc -I/home/user/.openmpi/include/ -L/home/user/.openmpi/lib/ -lmpi -std=c++11 *.cu *.cpp -o program
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).

Execution error message:

$ mpiexec -np 4 ./program 
[user:05728] OPAL ERROR: Not initialized in file pmix2x_client.c at line 109
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[user:05728] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[user:05729] OPAL ERROR: Not initialized in file pmix2x_client.c at line 109
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[user:05729] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[7361,1],0]
  Exit code:    1
--------------------------------------------------------------------------

I installed OpenMPI version 3.1.2 using the following instructions:

$ ./configure --prefix="/home/user/.openmpi" --with-cuda
$ make; sudo make install

I have also setup my $PATH and my $LD_LIBRARY_PATH variables accordingly based on instructions from this link

I am able to successfully execute the program on my laptop (Intel i7). Upon looking up the error I found some links suggesting that I reinstall OpenMPI. I have tried doing so multiple times (including a fresh download of the library) without any success.

Any help would be greatly appreciated!

Edits

I tried running the following Minimal code (main.cpp) as asked in the comments:

#include <iostream>
#include "mpi.h"
#include <string>

int main(int argc, char *argv[]) {
  int rank, size;
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  std::cout << rank << '\n';
  MPI_Finalize();
  return 0;
}

To compile this, I reran the previous command and got the same error:

$ nvcc -I/home/user/.openmpi/include/ -L/home/user/.openmpi/lib/ -lmpi -std=c++11 main.cpp -o program

But then if i compile it with mpic++ it is able to run perfectly fine.

$ mpic++ main.cpp -o ./program
$ mpiexec -np 4 ./program 
0
1
3
2
John.Ludlum
  • 145
  • 3
  • 13

1 Answers1

3

Is this the only version of OpenMPI that you have installed? My guess is that you're using different MPI versions between your build and run. Check which mpirun and also search for instances of mpirun. If you're on Ubuntu do

sudo updatedb
locate mpirun

If you call the correct mpirun (the same version used to build) then the error should go away.

Steven Walton
  • 505
  • 6
  • 20