I use 12 nodes windows HPC cluster (each with 24 cores) to run a C++ MPI program (use Boost MPI). One run with the MPI reduce, one comment out MPI reduce (for speed test only). The run time is 01:17:23 and 01:03:49. It seems to me that MPI reduce take a large portion of time. I think it might be worthy to try to first reduce at node level, then reduce to the head node to improve performance.
Below is a simple example for test purpose. Suppose there is 4 computer nodes, each has 2 cores. I want to first use mpi to reduce on each node. After that, reduce to the head node. I am not quite familiar with mpi and the below program crashes.
#include <iostream>
#include <boost/mpi.hpp>
namespace mpi = boost::mpi;
using namespace std;
int main()
{
mpi::environment env;
mpi::communicator world;
int i = world.rank();
boost::mpi::communicator local = world.split(world.rank()/2); // total 8 cores, divide in 4 groups
boost::mpi::communicator heads = world.split(world.rank()%4);
int res = 0;
boost::mpi::reduce(local, i, res, std::plus<int>(), 0);
if(world.rank()%2==0)
cout<<res<<endl;
boost::mpi::reduce(heads, res, res, std::plus<int>(), 0);
if(world.rank()==0)
cout<<res<<endl;
return 0;
}
The output is illegible, something like this
Z
h
h
h
h
a
a
a
a
n
n
n
n
g
g
g
g
\
\
\
\
b
b
b
b
o
o
o
o
o
o
o
o
s
...
...
...
The error message is
Test.exe ended prematurely and may have crashed. exit code 3
I suspect I did something wrong with the group split/or reduce but cannot figure it out with several trials.How do I change to make this work? Thanks.