1

I want to run a particular MPI function under google benchmark. Something like:

#include <mpi.h>
#include <benchmark/benchmark.h>

template<class Real>
void MPIInitFinalize(benchmark::State& state)
{

    auto mpi = []() {
        MPI_Init(nullptr, nullptr);
        foo();
        MPI_Finalize();
    };

    for(auto _ : state) {
       mpi();
    }
}

BENCHMARK_TEMPLATE(MPIInitFinalize, double);

BENCHMARK_MAIN();

Of course, we know what will happen:

*** The MPI_Init() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.

I understand that MPI isn't cool with what I want to do. But google benchmark is simply too useful to not at least try to find a hack to make this work.

Is there anything that can be done? Can I fork a process and pass the lambda to it? Is there a threading pattern that will work? Even expensive things will be helpful, as I can just subtract the cost of doing whatever hack works without a call too foo() from the one which call foo().

user14717
  • 4,757
  • 2
  • 44
  • 68
  • 2
    Do you really want to include `MPI_Init` and `MPI_Finalize` in the benchmark measurements? If not, why not just make your own main? – Zulan Aug 13 '19 at 15:00
  • Also, how do you start your benchmark? with `mpirun` ? if not, then you are only testing in singleton mode (e.g. a single MPI task) and this is unlikely what you want to be doing. – Gilles Gouaillardet Aug 14 '19 at 02:14

1 Answers1

1

If you don't need to include MPI_Init and MPI_Finalize in your time (which you probably don't want anyways) you can take alook at this gist: https://gist.github.com/mdavezac/eb16de7e8fc08e522ff0d420516094f5

It countains an example on how to benchmark MPI enabled code with google benchmark. The basic idea is to call google benchmark from your own main method (using ::benchmark::Initialize(&argc, argv) and ::benchmark::RunSpecifiedBenchmarks()), synchronize using MPI_Barrier, time your code using std::chrono::high_resolution_clock and using MPI_Allreduce to find the slowest process. You can then publish that time using state.SetIterationTime (but only on the main process).

Adrodoc
  • 673
  • 10
  • 19