Sending partial MPI messages

Question

To avoid allocating an intermediary buffer, it makes sense in my application that my MPI_Recv receives one single big array, but on the sending side, the data is non-contiguous, and I'd like it to make the data available to the network interface as soon as it is possible to organize it. Something like this:

MPI_Request reqs[N];
for(/* each one of my N chunks */) {
    partial_send(chunk, &reqs[chunk->idx]);
}

MPI_Waitall(N, reqs, MPI_STATUSES_IGNORE);

Or even better for me, do like in POSIX's writev function:

/* Precalculate this. */
struct iovec iov[N];
for(/* each one of my N chunks */) {
    iov[chunk->idx].iov_base = chunk->ptr;
    iov[chunk->idx].iov_len = chunk->len;
}

/* Done every time I need to send. */
MPI_Request req;
chunked_send(iov, &req);
MPI_Wait(req, MPI_STATUS_IGNORE);

Is such a thing possible in MPI?

Is your data unevenly non-contiguous, or is there a reason why you aren't creating an MPI derived datatype to describe all of your data on the sending side? That would allow you to send your data all at once instead of dealing with partial sends. — NoseKnowsAll, Apr 13 '16 at 01:42
You can also create a MPI derived data type for non-contiguous data. Unfortunately, in my experience there often is no advantage in creating such a datatype over copying data into a sending buffer manually. If you have multiple MPI_iSends you also need multiple MPI_recvs, I fear. Still those could all receive into the same large array with different starting points. — haraldkl, Apr 13 '16 at 05:24
Derived MPI datatypes are the way to go. It won't perform worse than sending multiple messages and will also benefit on platforms where the MPI library is able to translate the datatype into a gathered read by the network equipment. — Hristo Iliev, Apr 13 '16 at 14:05

score 0 · Accepted Answer · answered Apr 15 '16 at 11:07

I'd like to simply comment but can't as I am new to stack overflow and don't have sufficient reputation ...

If all your chunks are aligned on regular boundaries (e.g. they're pointers into some larger contiguous array) then you should use MPI_Type_indexed where the displacements and counts are all measured in multiples of the basic type (here it's MPI_DOUBLE I guess). However, if the chunks have, for example, been individually malloc'd and there's no guarantee of alignment then you'll need to use a more general MPI_Type_create_struct which specifies displacements in bytes (and also allows a different type for each block which you don't require).

I was worried that you might have to do some sorting to ensure that you scan linearly through memory so the displacements never go backwards (i.e. they are "monotonically nondecreasing"). However, I believe this is only a constraint if you are going to use the types for file IO with MPI-IO rather than for point-to-point send/recv.

Sending partial MPI messages

1 Answers1