I want to transfer a large amount of std::complex<double>
numbers using Boost.MPI. In the Boost.MPI tutorial it is explained that
To obtain optimal performance for small fixed-length data types not containing any pointers it is very important to mark them using the type traits of Boost.MPI and Boost.Serialization.
It was already discussed that fixed length types containing no pointers can be using as
is_mpi_datatype
, e.g.:namespace boost { namespace mpi { template <> struct is_mpi_datatype<gps_position> : mpl::true_ { }; } }
or the equivalent macro
BOOST_IS_MPI_DATATYPE(gps_position)
The documentation of is_mpi_datatype
gives another example:
[...] To do so, first make the data type Serializable (using the Boost.Serialization library); then, specialize the
is_mpi_datatype
trait for thepoint
type so that it willderive mpl::true_
:namespace boost { namespace mpi { template<> struct is_mpi_datatype<point> : public mpl::true_ { }; } }
When I try exactly that to optimize performance [Options (A) or (B) in my attempt below], I observe that Boost.MPI does not use the builtin MPI datatype MPI_DOUBLE_COMPLEX
nor does it map to the MPI_SUM
operation [assertions (3) and (4) in my attempt below]. Moreover, enabling one of (A) or (B) as well as disabling assertions (3) and (4) yields a segmentation fault at runtime.
In some source file of Boost.MPI I have found an undocumented(?) macro called BOOST_MPI_DATATYPE
which does the right thing, but is marked with the comment /// INTERNAL ONLY
.
Before implementing this ugly hack(?) I would like to ask: What is the intended way to tell Boost.MPI to use the builtin MPI_DOUBLE_COMPLEX
datatype for std::complex<double>
?
#include <complex>
#include <functional>
#include <iostream>
#include <boost/mpi.hpp>
#include <boost/mpi/operations.hpp>
#include <boost/serialization/complex.hpp>
// tested with GCC 6.2.0, OpenMPI 2.0.1, boost 1.62.0
// mpic++ -lboost_mpi -lboost_serialization boost-mpi-complex.cpp
using dcomplex = std::complex<double>;
using dcplus = std::plus<dcomplex>;
////////////////////////////////////////////////////////////////////////////////
// How to pass assertions (1) to (5) below?
////////////////////////////////////////////////////////////////////////////////
// (A): documented, but fails assertions (2) and (3)
// if (2) and (3) are removed with this OPTION: segmentation fault at runtime
//BOOST_IS_MPI_DATATYPE(dcomplex)
// (B): documented, but same problems as (A)
namespace boost::mpi {
//template<> struct is_mpi_datatype<dcomplex> : boost::mpl::true_ {};
}
// (C): works, but not documented(?) and has `INTERNAL ONLY` comment in source
namespace boost::mpi {
//BOOST_MPI_DATATYPE(dcomplex, MPI_DOUBLE_COMPLEX, complex);
}
// (D): works, equivalent to (C)
// BUT if `is_mpi_complex_datatype` is specialized without `get_mpi_datatype`
// then compilation is fine with all assertions, but running yields segfault
namespace boost::mpi {
//template<> inline MPI_Datatype get_mpi_datatype<dcomplex>(const dcomplex&) {
// return MPI_DOUBLE_COMPLEX;
//}
//template<> struct is_mpi_complex_datatype<dcomplex> : boost::mpl::true_ {};
}
// optional; works as expected for assertion (4)
namespace boost::mpi {
template<> struct is_commutative<dcplus, dcomplex> : mpl::true_ {};
}
// If these assertions are removed and none of (A) to (D) is activated then
// everything works as expected, but I would like to optimize serialization
static_assert(boost::mpi::is_mpi_datatype<dcomplex>{}); // (1)
static_assert(boost::mpi::is_mpi_builtin_datatype<dcomplex>{}); // (2)
static_assert(boost::mpi::is_mpi_op<dcplus, dcomplex>{}); // (3)
static_assert(boost::mpi::is_commutative<dcplus, dcomplex>{}); // (4)
static_assert(boost::serialization::is_bitwise_serializable<dcomplex>{}); // (5)
int main() {
boost::mpi::environment env{};
boost::mpi::communicator world{};
constexpr size_t N = 4;
dcomplex data[N]{};
if(0 == world.rank()) {
for(size_t i=0; i<N; ++i) data[i] = dcomplex{double(i), 0.0};
}
if(1 == world.rank()) {
for(size_t i=0; i<N; ++i) data[i] = dcomplex{0.0, double(N+i)};
}
all_reduce(world, boost::mpi::inplace(data), N, dcplus{});
if(0 == world.rank()) {
for(auto&& x : data) std::cout << x << std::endl;
}
return 0;
}