MPI_Scatterv doesn't work

Question

I've wrote a program in C/MPI that simply split a NxN matrix in submatrix (for rows) and then giving it to all processes with the routine MPI_Scatterv. The dimension N is not necessarily multiple of the number of processes. I decide to give one more row to a number of processes equal to DIM % size. The code is the following; it doesn't work, and I don't understand why. The error messages is something like this: job aborted: rank: node: exit code[: error message] 0: PACI: -1073741819: process 0 exited without calling finalize

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

#define DIM 4
#define ROOT 0

float **alloc (int, int);
void init (float **, int, int);
void print (float **, int, int);

int main(int argc, char *argv[])
{
    int rank,               
    size,               
    dimrecv,
    i;                  
    int *sendcount = NULL, *displs = NULL;
    float **matrix, **recvbuf;  

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    dimrecv = (int)(DIM / size);
    if(rank < (DIM % size))
        dimrecv += 1 ;
    recvbuf = alloc(dimrecv, DIM); 

    if (rank == ROOT) 
    {
        matrix = alloc(DIM, DIM);
        init(matrix, DIM, DIM);
        sendcount = (int*)calloc(size, sizeof(int));
        displs = (int*)calloc(size, sizeof(int));
        int total = 0;
        printf("MATRIX %d x %d", DIM, DIM);
        print(matrix, DIM, DIM);

        displs[0] = 0;
        for (i = 0; i < size; i++)
        {
            if (i < DIM % size)
                sendcount[i] = (ceil((float)DIM/size))*DIM;
            else
                sendcount[i] = (floor((float)DIM/size))*DIM;
            total += sendcount[i];
            if (i + 1 < size)
                displs[i + 1] = total;
        }
    }
MPI_Scatterv(&(matrix[0][0]), sendcount, displs, MPI_FLOAT,
             recvbuf, dimrecv*DIM, MPI_FLOAT, ROOT, MPI_COMM_WORLD);

printf("\n\n");

for(i = 0; i< size; i++)
{
    MPI_Barrier(MPI_COMM_WORLD);
    if (i == rank)
    {
        printf("SUBMATRIX P%d", i);
        print(recvbuf, dimrecv, DIM);
    }
}

free(matrix[0]);
free(matrix);
free(recvbuf[0]);
free(recvbuf);
/* quit */
MPI_Finalize();
return 0;
}

float **alloc(int rows, int cols)
{
    int i;
    float *num_elem = (float *)calloc(rows*cols, sizeof(float));
    float **matrix= (float **)calloc(rows, sizeof(float*));
    for (i=0; i<rows; i++)
        matrix[i] = &(num_elem[cols*i]);

    return matrix;
}

void init (float **matrix, int rows, int cols)
{
    int i, j;
    srand(time(NULL));
    for (i = 0; i < rows; i++) {
        for (j = 0; j < cols; j++)
            matrix[i][j] = 1 + (rand() % 5);
    }
}

void print (float **matrix, int rows, int cols)
{
int i, j;
for (i = 0; i < rows; i++) {
        printf("\n");
        for (j = 0; j < cols; j++)
            printf("%.1f ", matrix[i][j]);
    }
}

How could I solve the problem, using a dynamic allocation with a double pointer? I've wrote the same program in a static way and it works!.Thanks a lot. Pax.

Only rank 0 has any memory allocated for `sendcount` and `displs`. You need to first have the other ranks allocate memory for these pointers, and then `MPI_Broadcast` the values that they should have (and are calculated by rank 0). As a side note, it is generally considered bad form to cast the results of `malloc`, `calloc`, and co. — R_Kapp, Nov 09 '15 at 17:31

score 0 · Answer 1 · answered Nov 09 '15 at 19:04

You need to be more careful about which process/rank is allocating memory, and which process/rank is therefore freeing memory.

In your current implementation, you'll want rank == ROOT to allocate and initialize matrix, sendcount, and displs. You'll want every rank to allocate and initialize sendcount and displs (otherwise, when they each enter MPI_Scatterv how do they know what exactly they'll be receiving?). Finally, they'll also need to allocate but not initialize recvbuf. The initialization of this buffer happens internally to the MPI_Scatterv routine.

[Side note: You don't technically need to have each rank initialize sendcount and displs, although this will certainly be fastest. If only the rank == ROOT process has the knowledge to calculate these values, then you'll have to MPI_Bcast both of these arrays to every process before entering the MPI_Scatterv routine.]

And of course you'll then have to ensure that only the correct ranks free the correct memory they previously allocated.

The reason this worked in your static initialization is that each rank "allocated" the memory when you initially statically defined your arrays. Assuming you did this naively, you probably previously used excess memory in that implementation (because, as seen above, not every rank needs to allocate memory for every matrix/array you are using).

Hope this helps.

score 0 · Answer 2 · answered Nov 10 '15 at 12:47

Thanks Nose for your suggestion. Nevertheless the program doesn't work well. The modified code is the following:

...
MPI_Bcast(sendcount, 4, MPI_INT, ROOT, MPI_COMM_WORLD);
MPI_Bcast(displs, 4, MPI_INT, ROOT, MPI_COMM_WORLD);

MPI_Scatterv(&(matrix[0][0]), sendcount, displs, MPI_FLOAT,
             recvbuf, dimrecv*DIM, MPI_FLOAT, ROOT, MPI_COMM_WORLD);

printf("\n\n");
for(i = 0; i< size; i++)
{
    MPI_Barrier(MPI_COMM_WORLD);
    if (i == rank)
    {
        printf("SUBMATRIX P%d", i);
        print(recvbuf, dimrecv, DIM);
    }
}
if (rank == ROOT) {
    for (i=0; i<DIM; i++)
        free(matrix[i]);
    free(matrix);
}
for(i=0; i<dimrecv; i++)
    free(recvbuf[i]);
free(recvbuf);
free(sendcount);
free(recvbuf);

sendcount and displs has been allocated outside the visibility of rank ROOT. There must be something wrong in the code that I don't catch.

MPI_Scatterv doesn't work

2 Answers2