3

I installed mpich2 on my Ubuntu 14.04 laptop with the following command:

sudo apt-get install libcr-dev mpich2 mpich2-doc

This is the code I'm trying to execute:

#include <mpi.h>
#include <stdio.h>

int main()
{
    int myrank, size;
    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    printf("Hello world! I am %d of %d\n", myrank, size);

    MPI_Finalize();
    return 0;
}

Compiling it as mpicc helloworld.c gives no errors. But when I execute the program as: mpirun -np 5 ./a.out There is no output, the program just keeps executing as if it were in an infinite loop. On pressing Ctrl+C, this is what I get:

$ mpirun -np 5 ./a.out                                                                                                                                                        
^C[mpiexec@user] Sending Ctrl-C to processes as requested
[mpiexec@user] Press Ctrl-C again to force abort
[mpiexec@user] HYDU_sock_write (./utils/sock/sock.c:291): write error (Bad file descriptor)
[mpiexec@user] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:170): unable to write data to proxy
[mpiexec@user] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream
[mpiexec@user] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@user] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec@user] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion

I couldn't get any solution on googling. What is causing this error?

Nitin Labhishetty
  • 1,290
  • 2
  • 21
  • 41
  • Try `int main(int argc, char *argv[])` and `MPI_Init(&argc,&argv); `...But, as stated in the [documentation of MPICH2](http://www.mpich.org/static/docs/v3.1/www3/MPI_Init.html), `MPI_Init()` will accept NULL as input parameters... Try `which mpicc` , `which mpirun` and `mpirun -version` to know if the mpirun and mpicc correspond both to mpich. If mpicc refers to mpich and mpirun to openmpi, strange things can occur. Try also `mpirun -np 5 a.out` – francis Mar 16 '15 at 08:17
  • Thanks for the reply. I tried adding argc, argv, `mpirun np 5 a.out` but getting the same error. `which mpicc` gives `/usr/bin/mpicc` and which mpirun gives `/usr/bin/mpirun`. mpirun -version gave a lot of output with several occurences of mpich so I'm guessing it must refer to mpich. What more can I try? – Nitin Labhishetty Mar 16 '15 at 08:47
  • `mpiexec` is giving the same error. – Nitin Labhishetty Mar 16 '15 at 10:39
  • the other comments are getting at the most common reason for this problem: a mismatch between mpi implementations. Do you have any other MPI installed? OpenMPI, an older version of MPICH you built? – Rob Latham Mar 16 '15 at 16:25
  • Could you try `mpicc helloword.c -v` to print the programs called by the compiler ? Either `openmpi` or `mpich` is linked, and it will appear in `GCC_OPTIONS` as include of in the call to `collect2` as a library. – francis Mar 16 '15 at 17:24
  • I tried `mpicc helloworld.c -v`. It shows that mpich is linked. There is no reference of openmpi anywhere in it. – Nitin Labhishetty Mar 17 '15 at 05:04
  • I started getting this problem, too. The twist is, sometimes calling 'mpirun -n 4 ./sth.out' works just fine. But nonetheless, most of the times it just does nothing and ends in those mpiexec error messages. I couldn't figure out any pattern to this by now. @codeln: have you found a solution – rtime Apr 26 '16 at 16:57
  • Wow! I don't remember asking this question at all! I've switched from mpich2 to openmpi. Haven't been facing any issues since then. I also resolved several issues just by updating gcc to the latest version. Hope this helps. – Nitin Labhishetty Apr 26 '16 at 20:23

1 Answers1

1

I was getting the same issue with two compute nodes:

$ mpirun -np 10 -ppn 5 --hosts c1,c2 ./a.out  
[mpiexec@c1] Press Ctrl-C again to force abort
[mpiexec@c1] HYDU_sock_write (utils/sock/sock.c:286): write error (Bad file descriptor)
[mpiexec@c1] HYD_pmcd_pmiserv_send_signal (pm/pmiserv/pmiserv_cb.c:169): unable to write data to proxy
[mpiexec@c1] ui_cmd_cb (pm/pmiserv/pmiserv_pmci.c:79): unable to send signal downstream
[mpiexec@c1] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@c1] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec@c1] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion

Turns out c1 node couldn't ssh c2.

If you are using only single machine, you can try using fork as launcher:

mpirun -launcher fork -np 5 ./a.out
jyvet
  • 2,021
  • 15
  • 22