1

following these guidelines MpichClusterUbuntu, I'm trying to execute my very first mpi program with a PC with Ubuntu 18.04.01 Server Edition and a laptop with Ubuntu 18.04.02 Desktop. Till step 11 of this guideline, everything went fine, with no problems at all.

I set up a machinefile called hosts with these two lines:

192.168.1.7 # first 'master' node: the PC
192.168.1.5 # second node: the laptop

After compiling the very simple example file contained in the guidelines without:

#include <stdio.h>
#include <mpi.h>

int main(int argc, char** argv) {
    int myrank, nprocs;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

    printf("Hello from processor %d of %d\n", myrank, nprocs);
    MPI_Finalize();
    return 0;
}

mpiu@pc01:~$ mpicc mpi_hello.c -o mpi_hello

Executing without considering the machinefile 'hosts', this is the output:

mpiu@pc01:~$ mpiexec -n 8 ./mpi_hello
------------------------------------------------------------------
[[27419,1],0]: A high-performance Open MPI point-to-point messaging 
module was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: pc01

Another transport will be used instead, although this may result in
lower performance.

NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
----------------------------------------------------------------
Hello from processor 1 of 8
Hello from processor 2 of 8
Hello from processor 5 of 8
Hello from processor 6 of 8
Hello from processor 0 of 8
Hello from processor 3 of 8
Hello from processor 7 of 8
Hello from processor 4 of 8
[pc01:25010] 7 more processes have sent help message help-mpi-btl-
base.txt / btl:no-nics
[pc01:25010] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages

And when executing calling the machinefile 'hosts', the execution remains idle without producing any output:

mpiu@pc01:~$ mpiexec -n 8 -machinefile hosts ./mpi_hello

PS: this is the content of /etc/netplan/50-cloud-init.yaml in the "master" node (PC):

network:
    ethernets:
        enp3s0:
            #addresses: []
            #dhcp4: true
            addresses: [192.168.1.7/24]
            gateway4: 192.168.1.1
            nameservers:
                addresses: [8.8.8.8,8.8.4.4]
            dhcp4: no
    version: 2

Updates: after the correct comment of Gilles, I removed openmpi which I guess it was installed previously.

Now executing the step 11 of the guidelines MpichClusterUbuntu18.04 : A) without calling the machinefile:

marco@pc01:/mirror$ mpiexec -n 8 ./mpi_hello
Hello from processor 0 of 8
Hello from processor 1 of 8
Hello from processor 3 of 8
Hello from processor 5 of 8
Hello from processor 6 of 8
Hello from processor 7 of 8
Hello from processor 2 of 8
Hello from processor 4 of 8

B) But calling the machinefile "hosts":

marco@pc01:/mirror$ mpiexec -n 8 -machinefile /home/mpiu/hosts    
./mpi_hello
ssh: Could not resolve hostname pc0: Temporary failure in name resolution
ssh: Could not resolve hostname riccarcohp: Temporary failure in name 
resolution
^C[mpiexec@pc01] Sending Ctrl-C to processes as requested
[mpiexec@pc01] Press Ctrl-C again to force abort
[mpiexec@pc01] HYDU_sock_write (utils/sock/sock.c:286): write error (Bad 
file descriptor)
[mpiexec@pc01] HYD_pmcd_pmiserv_send_signal (pm/pmiserv
/pmiserv_cb.c:177): unable to write data to proxy
[mpiexec@pc01] ui_cmd_cb (pm/pmiserv/pmiserv_pmci.c:79): unable to send    
signal downstream
[mpiexec@pc01] HYDT_dmxu_poll_wait_for_event (tools/demux
/demux_poll.c:77): callback returned error status
[mpiexec@pc01] HYD_pmci_wait_for_completion (pm/pmiserv
/pmiserv_pmci.c:198): error waiting for event
[mpiexec@pc01] main (ui/mpich/mpiexec.c:340): process manager error 
waiting for completion

After putting in machinefile 'hosts' only the IP Addresses:

mpiu@pc01:/mirror$ mpiexec -n 8 -machinefile /home/mpiu/hosts ./mpi_hello
Permission denied, please try again.
Permission denied, please try again.
mpiu@192.168.1.5: Permission denied (publickey,password).

But I can ssh with no problems at all from the PC to the laptop:

mpiu@pc01:/mirror$ ssh 192.168.1.5
mpiu@192.168.1.5's password:
mpiu@riccardo-HP-Laptop-15-da0xxx:~$

Now it seems SOLVED, even if I repeated for the third time, right the same procedure:

these are the steps I followed for setting up passwordless SSH between pc01 (the and riccardohp (laptop):

marco@pc01:/$ su - mpiu
Password:
mpiu@pc01:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/mpiu/.ssh/id_rsa):
Created directory '/home/mpiu/.ssh'.

To make it simpler, I left out the passphrase:

Your identification has been saved in /home/mpiu/.ssh/id_rsa.
Your public key has been saved in /home/mpiu/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:..... mpiu@pc01
The key's randomart image is:
+---[RSA 2048]----+
...................
...................
+----[SHA256]-----+

I copied the public key from pc01 to the laptop:

mpiu@pc01:~$ ssh-copy-id 192.168.1.5
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home
/mpiu/.ssh/id_rsa.pub"
The authenticity of host '192.168.1.5 (192.168.1.5)' can't be 
established.
ECDSA key fingerprint is SHA256:.......................
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to 
filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are
prompted now it is to install the new keys

mpiu@192.168.1.5's password:
Number of key(s) added: 1
Now try logging into the machine, with:   "ssh '192.168.1.5'"
and check to make sure that only the key(s) you wanted were added.

mpiu@pc01:~$ ssh '192.168.1.5'
Welcome to Ubuntu 18.04.2 LTS (GNU/Linux 4.18.0-16-generic x86_64)
mpiu@riccardo-HP-Laptop-15-da0xxx:~$

So, apparently, it seems that the ssh connection between pc01 and the laptop works fine...

mpiu@riccardo-HP-Laptop-15-da0xxx:~$ ^C
mpiu@riccardo-HP-Laptop-15-da0xxx:~$ logout
Connection to 192.168.1.5 closed.
mpiu@pc01:~$ cd /
mpiu@pc01:/$ cd mirror
mpiu@pc01:/mirror$ mpicc mpi_hello.c -o mpi_hello
gcc: error: mpi_hello.c: No such file or directory
mpiu@pc01:/mirror$ nano mpi_hello.c
mpiu@pc01:/mirror$ mpicc mpi_hello.c -o mpi_hello
mpiu@pc01:/mirror$ mpiexec -n 8 ./mpi_hello
Hello from processor 0 of 8
Hello from processor 1 of 8
Hello from processor 2 of 8
Hello from processor 3 of 8
Hello from processor 4 of 8
Hello from processor 5 of 8
Hello from processor 6 of 8
Hello from processor 7 of 8

I put in file hosts in /mirror:

192.168.1.7
192.168.1.5

mpiu@pc01:/mirror$ mpiexec -n 8 -machinefile hosts ./mpi_hello
Hello from processor 2 of 8
Hello from processor 4 of 8
Hello from processor 6 of 8
Hello from processor 0 of 8
Hello from processor 1 of 8
Hello from processor 3 of 8
Hello from processor 5 of 8
Hello from processor 7 of 8

Marco

user2315094
  • 759
  • 3
  • 16
  • 29
  • the message seems coming from Open MPI (and not MPICH). you should first clarify the library you are using is the one you intend to use. – Gilles Gouaillardet Mar 09 '19 at 09:29
  • Thanks @GillesGouaillardet. Based on your right observation, I updated my question with the new output – user2315094 Mar 10 '19 at 11:39
  • 1
    you need to be able to SSH **passwordless** between nodes (the logs suggests you can SSH only if you manually type the password, and that is not enough). – Gilles Gouaillardet Mar 11 '19 at 04:09
  • Hi @GillesGouaillardet now, as described above, it seems solved, even if I just repeated exactly the same procedure for the third time. Thank you very much for your kind help – user2315094 Mar 11 '19 at 09:20

0 Answers0