0

I am trying to run a simple MPI example on a cluster with multiple computing nodes. Now I am just using two test nodes, including gpu8 and gpu12.

What I've done include:

  • gpu8 and gpu12 have the correct MPI environment (OpenMPI-4.0.1). I can successfully run the MPI example on a single node.
  • Passwordless login between gpu8 and gpu12 has been setup. They can ssh to another node with no issues.
  • There is a hostfile on each node containing
gpu8
gpu12
  • The executable files are under the same path.
  • echo $PATH (on both nodes) gives
/home/user_1/share/local/openmpi-4.0.1/bin:xxxxxx
  • echo $LD_LIBRARY_PATH (on both nodes) gives
/home/t716/shshi/share/local/openmpi-4.0.1/lib:

The ORTE problem:

I am running mpirun -np 2 --hostfile /home/user_2/hosts ./home/user_2/mpi-hello-world/mpi_hello_world. The error output is:

bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--------------------------------------------------------------------------
Joxixi
  • 651
  • 5
  • 18
  • 1
    try replacing `mpirun` by `$(which mpirun)` – Gilles Gouaillardet May 11 '20 at 09:10
  • @GillesGouaillardet, thanks for your kind help! It definitely works! So does it means that it is ok if different nodes have different OpenMPI paths? – Joxixi May 11 '20 at 09:55
  • note quite. `mpirun` will `ssh ... orted` under the hood, so if your `$PATH` is not propagated, it won't be found. OTOH, `/.../bin/mpirun` will `ssh /.../bin/orted` so as long as Open MPI is installed in the same directory on all nodes, that will be fine. FWIW, I always `configure --enable-orterun-prefix-by-default` so I do not need to worry about `$PATH` nor use absolute paths. – Gilles Gouaillardet May 11 '20 at 12:48
  • Does this answer your question? [OpenMPI: Simple 2-Node Setup](https://stackoverflow.com/questions/22925515/openmpi-simple-2-node-setup) – Joachim Apr 28 '21 at 12:24

0 Answers0