I am working on a C++ application, where I use the MPI C bindings to send and receive data over a network. I understand that sending
const int VECTOR_SIZE = 1e6;
std::vector<int> vector(VECTOR_SIZE, 0.0);
via
// Version A
MPI_Send(const_cast<int *>(vector.data()), vector.size(), MPI_INT, 1, 0, MPI_COMM_WORLD);
is much more efficient than
// Version B
for (const auto &element : vector)
MPI_Send(const_cast<int *>(&element), 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
due to the latency introduced by MPI_Send
. However, if I want to send data structures which are not contiguous in memory (a std::list<int>
, for instance), I cannot use version A but have to resort to version B or copy the list's content to a contiguous container (like std::vector<int>
, for instance) first and use version A. Since I want to avoid an extra copy, I wonder if there are any options/other functions in MPI which allow for an efficient use of Version B (or at least a similar, loop-like construct) without incurring the latency each time MPI_Send
is called?