5

Open MPI Version: v4.0.0

Output of ompi_info | head on two machine

mpiuser@s2:~$ ssh s1 ompi_info | head
                 Package: Open MPI mpiuser@s1 Distribution
                Open MPI: 4.0.0
  Open MPI repo revision: v4.0.0
   Open MPI release date: Nov 12, 2018
                Open RTE: 4.0.0
  Open RTE repo revision: v4.0.0
   Open RTE release date: Nov 12, 2018
                    OPAL: 4.0.0
      OPAL repo revision: v4.0.0
       OPAL release date: Nov 12, 2018
mpiuser@s2:~$ ompi_info | head
                 Package: Open MPI mpiuser@s2 Distribution
                Open MPI: 4.0.0
  Open MPI repo revision: v4.0.0
   Open MPI release date: Nov 12, 2018
                Open RTE: 4.0.0
  Open RTE repo revision: v4.0.0
   Open RTE release date: Nov 12, 2018
                    OPAL: 4.0.0
      OPAL repo revision: v4.0.0
       OPAL release date: Nov 12, 2018

Both are installed using common shared network.

while running command on s1(master)

mpiuser@s1:/disk3/cloud/openmpi-4.0.0/examples$ mpirun -n 2 ./hello
Hello, world, I am 1 of 2, (Open MPI v4.0.0, package: Open MPI mpiuser@s1 Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, 2018, 112)
Hello, world, I am 0 of 2, (Open MPI v4.0.0, package: Open MPI mpiuser@s1 Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, 2018, 112)

while running command separately in s2(slave)

mpiuser@s2:~/cloud$ mpirun -n 2 ./hello
Hello, world, I am 0 of 2, (Open MPI v4.0.0, package: Open MPI mpiuser@s2 Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, 2018, 113)
Hello, world, I am 1 of 2, (Open MPI v4.0.0, package: Open MPI mpiuser@s2 Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, 2018, 113)

Output of hwloc command on s2:

mpiuser@s2:~/cloud/openmpi-4.0.0$ dpkg -l | grep hwloc
mpiuser@s2:~/cloud/openmpi-4.0.0$

Output of hwloc command on s1:

mpiuser@s1:/disk3/cloud/openmpi-4.0.0/examples$ dpkg -l | grep hwloc
mpiuser@s1:/disk3/cloud/openmpi-4.0.0/examples$

Both machines are running on Ubuntu 16.04.5 LTS

but while running command on distributed giving following error

mpiuser@s1:/disk3/cloud/openmpi-4.0.0/examples$ mpirun -host s1,s2 ./hello
[s2:26283] [[40517,0],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file grpcomm_direct.c at line 355
--------------------------------------------------------------------------
An internal error has occurred in ORTE:

[[40517,0],1] FORCE-TERMINATE AT Data unpack would read past end of buffer:-26 - error grpcomm_direct.c(359)

This is something that should be reported to the developers.
--------------------------------------------------------------------------
Rahul Kulhari
  • 1,115
  • 1
  • 15
  • 44

1 Answers1

0

Please see this post as answer. The problem may come from a missing link from the zlib libary which is used to compress data from one host to another. Please make sure zlib.h is in /usr/include. If not: do sudo apt install zlib1g-dev and then reinstall entirely mpi through configure, make and make install.

Joachim
  • 490
  • 5
  • 24