boost mpi nonblocking + hierarchy gather

Question

I have a multi-processing program using Visual C++ and Boost MPI. Each process do it parts and in the end, process 0 gather all the results and summarize. Below is an excerpt of the code (poolsummary is class using Boost serialization)

  if(rank == 0){
            vector<poolsummary> ps_;
            vector<poolsummary> ps2_;
            gather(world, ps, ps_, 0);
            gather(world, ps2, ps2_, 0);
            for(int i = 1; i < size;i++){
                ps_[0].updateFromPool(ps_[i]);
                ps2_[0].updateFromPool(ps2_[i]);
            }
            ps_[0].Save_file(asp.SCENARIO_PATH);
            ps2_[0].Save_file2(asp.SCENARIO_PATH);
            vector<poolsummary>().swap(ps_);
            vector<poolsummary>().swap(ps2_);
        }else{
            gather(world, ps, 0);
            gather(world, ps2, 0);
        }

The program still need to gather two additional classes (let us call them hist and rep).

Usually I run this program using 64 processors and there is a long tail for this gather part. I think two ways might be able to improve the performance 1. Using non-blocking gather or something 2. group the processes into 8 group (e.g. process 0 - 7 as group 1, process 8 - 15 as group 2 ...); Then first do a gather within each group, then gather groups

Could someone help me on if these solutions will work? If not, what are some possible ways to improve the performance? Is so, how to implement these two? Thanks so much for your time.

First you should figure out where the time is spent in `MPI_Gather()`. I suggest you insert a `MPI_Barrier()` before and time both collective subroutines. That will tell you if the time is spent in the communication, or if the delay is caused by some load imbalance between MPI tasks. `MPI_Igather()` is the non blocking version of `MPI_Gather()`, and if your MPI implementation provides a progress thread, that can be an option too (make sure the root rank posts the collective as early as possible). — Gilles Gouaillardet, Jun 11 '19 at 00:06
@GillesGouaillardet, thanks. This is really helpful. It turns out the gather only has some small portion of time spent. I would focus on improve other part. — user11594134, Jun 11 '19 at 20:58

boost mpi nonblocking + hierarchy gather

0 Answers0