I am building a MPI application using mpi4py (1.3.1) and openmpi (1.8.6-1) in Arch Linux ARM (on a Raspberry Pi cluster, to be more specific). I've run my program successfully on 3 nodes (4 processes), and when trying to add a new node, here's what happens:
Host key verification failed.
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
The funny thing is, the ssh keys are fine, since I'm using the same nodes (I can remove any entry of the host file, add the new node, and it will work, so I am pretty sure that the problem is not with a misconfigured ssh setup. It only happens when I use 5 processes).
Could this be a bug in the library of some sort?
Here's my host file
192.168.1.26 slots=2
192.168.1.188 slots=1
#192.168.1.202 slots=1 If uncommented and run with -np 5, it will raise the error
192.168.1.100 slots=1
Thanks in advance!