To avoid allocating an intermediary buffer, it makes sense in my application that my MPI_Recv
receives one single big array, but on the sending side, the data is non-contiguous, and I'd like it to make the data available to the network interface as soon as it is possible to organize it. Something like this:
MPI_Request reqs[N];
for(/* each one of my N chunks */) {
partial_send(chunk, &reqs[chunk->idx]);
}
MPI_Waitall(N, reqs, MPI_STATUSES_IGNORE);
Or even better for me, do like in POSIX's writev
function:
/* Precalculate this. */
struct iovec iov[N];
for(/* each one of my N chunks */) {
iov[chunk->idx].iov_base = chunk->ptr;
iov[chunk->idx].iov_len = chunk->len;
}
/* Done every time I need to send. */
MPI_Request req;
chunked_send(iov, &req);
MPI_Wait(req, MPI_STATUS_IGNORE);
Is such a thing possible in MPI?