0

I am currently learning to use MPI and trying to scatter blocks of my array to the rest of my processors.

My root processor is the last one (nproc-1) and I am generating the array in that processor. In my next iteration of my code it will be a random array.

For all my processors I am allocating contiguous memory using calloc both for 'array' and 'grain'. Grain stores the data to process and since I need the above and below rows from the original array, I made it of size grain_length+2.

My issue is that I get the correct data from the original array except for the last two values (see output example below).

int main(int argc, char** argv) 
{
    int         i, j, m;
    int         array_size, grain_length;
    int         rc, rank, nproc;
    MPI_Status  status;

    rc = MPI_Init(&argc, &argv);
    if (rc != MPI_SUCCESS)
    {
        printf("Error starting MPI Program.\n");
        MPI_Abort(MPI_COMM_WORLD, rc);
    }

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nproc);

    array_size = 8;

    grain_length = array_size / nproc;

    double **array= (double **) calloc(array_size, sizeof (double *));
    for (i = 0; i < array_size; i++)
        array[i] = (double *) calloc(array_size, sizeof (double));

    double **grain = (double **) calloc(grain_length+2, sizeof (double *));
    for (i = 0; i < grain_length + 2; i++)
        grain[i] = (double *) calloc(array_size, sizeof (double));

    if (array == NULL || grain == NULL)
    {
        printf("Memory could not be allocated for the arrays.");
        exit(EXIT_FAILURE);
    }

    if (rank == nproc-1)
    {
        for (i = 0; i < array_size; i++) 
        {
            for (j = 0; j < array_size; j++) 
            {
                //array[i][j] = rand() % 10;
                array[i][j] = i+j;
            }
        }
    }

    MPI_Scatter(
            &array[0][0], grain_length*array_size, MPI_DOUBLE,
            &grain[1][0], grain_length*array_size, MPI_DOUBLE, 
            nproc-1, MPI_COMM_WORLD);

    for (m = 0; m < nproc; m++)
    {
        if (rank == m) 
        {
            printf("Grain from processor %d:\n", rank);
            for (i = 0; i < grain_length+2; i++)
            {
                for (j = 0; j < array_size; j++)
                {
                    printf("%f\t", grain[i][j]);
                }
                printf("\n");
            }
            printf("\n");
        }   
        MPI_Barrier(MPI_COMM_WORLD);
    }

    if (rank == nproc-1)
    {
        printf("Array from processor %d:\n", rank);
        for (i = 0; i < array_size; i++)
        {
            for (j = 0; j < array_size; j++)
            {
                printf("%f\t", array[i][j]);
            }
            printf("\n");
        }
        printf("\n");
    }

    MPI_Finalize();
    return 0;
}

Here is the output. In Grain 0, the first and last row are 0s as expected since the above and below rows will be sent and placed there. Then the second row is correct but the third row is missing the 7 and 8 values which are the first values in Grain 1.

Are the two 0s in Grain 0 the array's two pointers addresses? I don't understand why I am getting incomplete data when the array in memory is stored contiguously.

I tried to use scatterv with the displacement but I am not sure I understand how it works. I also tried to create an MPI Type but didn't get far away with that either.

What I managed to do is to broadcast the each row of the array to all the others processors. But it is quite inefficient I think. This is how I did it.

for (i=0; i < array_size; i++) 
     MPI_Bcast(&array[i][0], array_size, MPI_DOUBLE, nproc-1, MPI_COMM_WORLD);

Many thanks in advance for your help!!

Grain from processor 0:
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
0.000000    1.000000    2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    
1.000000    2.000000    3.000000    4.000000    5.000000    6.000000    0.000000    0.000000    
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    

Grain from processor 1:
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
7.000000    8.000000    0.000000    0.000000    2.000000    3.000000    4.000000    5.000000    
8.000000    9.000000    0.000000    0.000000    3.000000    4.000000    0.000000    0.000000    
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    

Grain from processor 2:
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   0.000000    0.000000    
6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   0.000000    0.000000    
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    

Grain from processor 3:
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
0.000000    0.000000    5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   
0.000000    0.000000    6.000000    7.000000    8.000000    9.000000    0.000000    0.000000    
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    

Array from processor 3:
0.000000    1.000000    2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    
1.000000    2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    8.000000    
2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    8.000000    9.000000    
3.000000    4.000000    5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   
4.000000    5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   
5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   12.000000   
6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   12.000000   13.000000   
7.000000    8.000000    9.000000    10.000000   11.000000   12.000000   13.000000   14.000000   
  • Your 2D arrays must be in contiguous memory. The simplest option with modern C is to declare `double array[array_size][array_size]`. If this is not an option, a quick search on SO will point you into the right direction on how to dynamically allocate a 2D array in contiguous memory. – Gilles Gouaillardet Dec 06 '19 at 00:15

1 Answers1

0

I was able to achieve the result you want by changing the length of sent information to each individual process from:

MPI_Scatter(
        &array[0][0], grain_length*array_size, MPI_DOUBLE,
        &grain[1][0], grain_length*array_size, MPI_DOUBLE, 
        nproc-1, MPI_COMM_WORLD);

To:

MPI_Scatter(
        &array[0][0], 4+grain_length*array_size, MPI_DOUBLE,
        &grain[1][0], 4+grain_length*array_size, MPI_DOUBLE, 
        nproc-1, MPI_COMM_WORLD);

The result:

    Grain from processor 0:
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
0.000000    1.000000    2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    
1.000000    2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    8.000000    
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    

Grain from processor 1:
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    8.000000    9.000000    
3.000000    4.000000    5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    

Grain from processor 2:
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
4.000000    5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   
5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   12.000000   
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    

Grain from processor 3:
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   12.000000   13.000000   
7.000000    8.000000    9.000000    10.000000   11.000000   12.000000   13.000000   14.000000   
0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    

Array from processor 3:
0.000000    1.000000    2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    
1.000000    2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    8.000000    
2.000000    3.000000    4.000000    5.000000    6.000000    7.000000    8.000000    9.000000    
3.000000    4.000000    5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   
4.000000    5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   
5.000000    6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   12.000000   
6.000000    7.000000    8.000000    9.000000    10.000000   11.000000   12.000000   13.000000   
7.000000    8.000000    9.000000    10.000000   11.000000   12.000000   13.000000   14.000000   

I hope it will help you.

M M
  • 429
  • 5
  • 13