2

I want to send large messages in MPI, with more than 2^31 B (or char or double or anything, but I'll use bytes here). The point is to get around the int limit.

I have this code which sends just a bit more then 2^31 B, by sending 2050 MB. I use MPI_Probe and MPI_Get_count to dynamically receive the size on the receiver side and run the code on 2 ranks.

#include<mpi.h>
#include<stdio.h>
#include<vector>
#include<cassert>

using namespace std;

int main()
{

    MPI_Init(NULL, NULL);

    int size;
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

    MPI_Datatype MPI_MEGABYTE_TYPE;
    int mega = 1048576;
    MPI_Type_contiguous(mega, MPI_BYTE, &MPI_MEGABYTE_TYPE);
    MPI_Type_commit(&MPI_MEGABYTE_TYPE);

    int count = 2050;
    size_t length = static_cast<size_t>(mega) * static_cast<size_t>(count);
    vector<char> buffer(length);
    if(my_rank == 0) {
        MPI_Send(buffer.data(), count, MPI_MEGABYTE_TYPE, 1, 0, MPI_COMM_WORLD);
    } else {
        {
            MPI_Status mpi_status;
            int mpi_count = 0;
            MPI_Probe(0, 0, MPI_COMM_WORLD, &mpi_status);
            MPI_Get_count(&mpi_status, MPI_MEGABYTE_TYPE, &mpi_count);
            printf("get count before = %d\n", mpi_count);
        }
        {
            MPI_Status mpi_status;
            int mpi_count = 0;
            MPI_Recv(buffer.data(), count, MPI_MEGABYTE_TYPE, 0, 0, MPI_COMM_WORLD, &mpi_status);
            MPI_Get_count(&mpi_status, MPI_MEGABYTE_TYPE, &mpi_count);
            printf("get count after = %d\n", mpi_count);
        }
    }

    MPI_Finalize();
}

Depending on the implementation of MPI I use, I usually get (like with MPICH 3.3.2 or OpenMPI 4.0.3 on Mac, or with other versions of OpenMPI on a linux cluster)

get count before = 2050
get count after = 2050

but sometimes (in particular with MVAPICH2-2.3 MPI on some linux cluster) I get

get count before = 2048
get count after = 2050

Am I correct in saying that 2048 is wrong (hence that implementation has a bug) and it should be 2050 ? 2050 seems obviously the right answer, but sometimes MPI is tricky. I could have missed something. I haven't been able to find a clear answer in the standard https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf for MPI_Get_count when using composite datatypes.

Shawn
  • 593
  • 4
  • 12
  • Looks like a bug in MVAPICH2! – Gilles Gouaillardet Apr 13 '20 at 23:14
  • The MVAPICH2 changelog mentions "Fix issues in handling very large messages with CMA" in version 2.3.2. CMA most likely stands for Cross-Memory Attach, which is a Linux mechanism for transfer of data between processes without an intermediate shared memory buffer. – Hristo Iliev Apr 14 '20 at 11:28

0 Answers0