MPI_Waitany causes segmentation fault

Question

I am using MPI to distribute images to different processes so that:

Process 0 distribute images to different processes.

Processes other than 0 process the image and then send the result back to process 0.

Process 0 tries to busy a process whenever the latter finishes its job with an image, so that as soon as it is idle, it is assigned another image to process. The code follows:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include "mpi.h"

#define MAXPROC 16    /* Max number of processes */
#define TOTAL_FILES 7

int main(int argc, char* argv[]) {
        int i, nprocs, tprocs, me, index;
        const int tag  = 42;    /* Tag value for communication */

        MPI_Request recv_req[MAXPROC];  /* Request objects for non-blocking receive */
        MPI_Request send_req[MAXPROC]; /* Request objects for non-blocking send */     
        MPI_Status status;              /* Status object for non-blocing receive */

        char myname[MPI_MAX_PROCESSOR_NAME];             /* Local host name string */
        char hostname[MAXPROC][MPI_MAX_PROCESSOR_NAME];  /* Received host names */
        int namelen;   

        MPI_Init(&argc, &argv);                /* Initialize MPI */
        MPI_Comm_size(MPI_COMM_WORLD, &nprocs);    /* Get nr of processes */
        MPI_Comm_rank(MPI_COMM_WORLD, &me);    /* Get own identifier */

        MPI_Get_processor_name(myname, &namelen);  /* Get host name */
        myname[namelen++] = (char)0;              /* Terminating null byte */

        /* First check that we have at least 2 and at most MAXPROC processes */
        if (nprocs<2 || nprocs>MAXPROC) {
                if (me == 0) {
                  printf("You have to use at least 2 and at most %d processes\n", MAXPROC);
                }
                MPI_Finalize(); exit(0);
        }

        /* if TOTAL_FILES < nprocs then use only TOTAL_FILES + 1 procs */
        tprocs = (TOTAL_FILES < nprocs) ? TOTAL_FILES + 1 : nprocs;
        int done = -1;

        if (me == 0) {    /* Process 0 does this */

                int send_counter = 0, received_counter;

                for (i=1; i<tprocs; i++) {
                        MPI_Isend(&send_counter, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
                        ++send_counter;
                        /* Receive a message from all other processes */
                        MPI_Irecv (hostname[i], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[i]);
                }      

                for (received_counter = 0; received_counter < TOTAL_FILES; received_counter++){

                        /* Wait until at least one message has been received from any process other than 0*/
                        MPI_Waitany(tprocs-1, &recv_req[1], &index, &status);

                        if (index == MPI_UNDEFINED) perror("Errorrrrrrr");                     
                        printf("Received a message from process %d on %s\n", status.MPI_SOURCE, hostname[index+1]);

                        if (send_counter < TOTAL_FILES){ /* si todavia faltan imagenes por procesar */
                                MPI_Isend(&send_counter, 1, MPI_INT, status.MPI_SOURCE, tag, MPI_COMM_WORLD, &send_req[status.MPI_SOURCE]);
                                ++send_counter;
                                MPI_Irecv (hostname[status.MPI_SOURCE], namelen, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &recv_req[status.MPI_SOURCE]);
                        }      
                }

              for (i=1; i<tprocs; i++) {
                      MPI_Isend(&done, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &send_req[i]);
              }

        } else if (me < tprocs) { /* all other processes do this */

                int y;         
                MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);

                while (y != -1) {                                      
                        printf("Process %d: Received image %d\n", me, y);
                        sleep(me%3+1);  /* Let the processes sleep for 1-3 seconds */

                        /* Send own identifier back to process 0 */
                        MPI_Send (myname, namelen, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
                        MPI_Recv(&y, 1, MPI_INT, 0,tag,MPI_COMM_WORLD,&status);                
                }      
        }

        MPI_Finalize();
        exit(0);
}

which is based on this example.

Right now I'm getting a segmentation fault, not sure why. I'm fairly new to MPI but I can't see a mistake in the code above. It only happens with certain numbers of processes. For example, when TOTAL_FILES = 7 and is run with 5, 6 or 7 processes. Works fine with 9 processes or above.

The entire code can be found here. Trying it with 6 processes causes the mentioned error.

To compile and execute :

mpicc -Wall sscce.c -o sscce -lm 
mpirun -np 6 sscce

Can you create an http://sscce.org so we can help debug your code? — Wesley Bland, Jul 20 '14 at 14:38
@WesleyBland My code is heavily based on an example code linked above. Edited in a try to make it sscce. Please, let me know if that is not enough for you — PALEN, Jul 21 '14 at 02:19
There is still no SSCCE here. Not even a `main` function. Did you not read the website? — Lightness Races in Orbit, Jul 21 '14 at 12:02

Hristo Iliev · Accepted Answer · 2014-07-23T22:23:23.020

2

It's not MPI_Waitany that is causing segmentation fault but it is the way you handle the case when all requests in recv_req[] are completed (i.e. index == MPI_UNDEFINED). perror() does not stop the code and it continues further and then segfaults in the printf statement while trying to access hostname[index+1]. The reason for all requests in the array being completed is that due to the use of MPI_ANY_SOURCE in the receive call the rank of the sender is not guaranteed to be equal to the index of the request in recv_req[] - simply compare index and status.MPI_SOURCE after MPI_Waitany returns to see it for yourself. Therefore the subsequent calls to MPI_Irecv with great probability overwrite still not completed requests and thus the number of requests that can get completed by MPI_Waitany is less than the actual number of results expected.

Also note that you never wait for the send requests to complete. You are lucky that Open MPI implementation uses an eager protocol to send small messages and therefore those get sent even though MPI_Wait(any|all) or MPI_Test(any|all) is never called on the started send requests.

edited Jul 23 '14 at 22:23

answered Jul 21 '14 at 11:34

Hristo Iliev

72,659
12
135
186

Indeed, that was the problem. Question though: If I use MPI_Isend, then MPI_Irecv, and then Wait for the recv, should I still use MPI_Wait for the send? Why? If I receive a response from a process, it means that request was sent, the process received the request, processed it and made that response... – PALEN Jul 21 '14 at 15:43
1

The MPI standard requires that requests initiated by `MPI_Isend` and `MPI_Irecv` should be completed appropriately by either calling `MPI_Wait` or by repeatedly calling `MPI_Test` until it returns with a positive test result. MPI implementations are even allowed to postpone the data transfer until those wait/test calls are made and many implementations do that when large messages are being sent. In your case it works simply because of a performance trick known as eager send being used. Otherwise your application is non-conforming. – Hristo Iliev Jul 21 '14 at 16:01
1

Moreover, not completing outstanding requests results in memory being leaked and with some MPI implementations could lead to significant communication slow-downs as the library has to traverse the ever growing request queue. – Hristo Iliev Jul 21 '14 at 16:04
Thank you for your input. It was really helpful in understanding MPI better. – PALEN Jul 23 '14 at 21:26

MPI_Waitany causes segmentation fault

1 Answers1