2

I'm trying to get MPI to disconnect a communicator, which is a tetchy business - I've put together a demo below. I've got two versions of the same idea, listening for an int, one using MPI_IRecv, and one using a boost::mpi::request.

You'll note when using mpiexec -n 2 on this program that version A will happily disconnect and exit, but version B will not. Is there some trick to MPI_Request_free-ing a boost::mpi::request? That seems to be the difference here. If it matters, I'm using MSVC and MSMPI, and Boost 1.62.

#include "boost/mpi.hpp"
#include "mpi.h"

int main()
{
    MPI_Init(NULL, NULL);
    MPI_Comm regional;
    MPI_Comm_dup(MPI_COMM_WORLD, &regional);
    boost::mpi::communicator comm = boost::mpi::communicator(regional, boost::mpi::comm_attach);
    if (comm.rank() == 1)
    {
        int q;

        //VERSION A:
//      MPI_Request n;
//      int j = MPI_Irecv(&q, 1, MPI_INT, 1, 0, regional, &n);
//      MPI_Cancel(&n);
//      MPI_Request_free(&n);

        //VERSION B:

//      boost::mpi::request z = comm.irecv<int>(1, 0, q);
//      z.cancel();

    }
    MPI_Comm_disconnect(&regional);
    MPI_Finalize();
    return 0;
}

Did I find a bug? I doubt I'm deep in the code.

alfC
  • 14,261
  • 4
  • 67
  • 118
Carbon
  • 3,828
  • 3
  • 24
  • 51
  • comm_duplicate greatly improves the situation but you still should be able to do this. – Carbon May 19 '17 at 21:22
  • Version B returns without problem on Linux and boost 1.61. – Shibli May 19 '17 at 21:27
  • @Shibli most likely depends on the MPI implementation. Example code does not block here for OpenMPI 1.10. But it's easy to see that it's incorrect from the boost sources / MPI standard. – Zulan May 19 '17 at 21:34

1 Answers1

1

Well, it guess it's not a bug if it's documented: MPI_Request_free is unsupported by Boost.MPI.

Now going back to MPI itself:

A call to MPI_CANCEL marks for cancellation a pending, nonblocking communication operation (send or receive). The cancel call is local. It returns immediately, possibly before the communication is actually cancelled. It is still necessary to call MPI_REQUEST_FREE, MPI_WAIT or MPI_TEST (or any of the derived operations) with the cancelled request as argument after the call to MPI_CANCEL. If a communication is marked for cancellation, then a MPI_WAIT call for that communication is guaranteed to return, irrespective of the activities of other processes (i.e., MPI_WAIT behaves as a local function);

That means, just:

z.cancel();
z.wait();

and you should be fine.

Now, IMHO this is a bad waste of proper RAII by Boost.MPI.

Zulan
  • 21,896
  • 6
  • 49
  • 109
  • Ah, got it. That makes no sense whatsoever, but OK. – Carbon May 19 '17 at 21:28
  • 1
    If it's any reconciliation - Boost.MPI regularly surprises me with stupid big and little issues and by breaking stuff that works just fine with normal MPI. Or is there an aspect about the standard that doesn't seem sensible to you? – Zulan May 19 '17 at 21:40
  • Nah, I get what normal MPI is doing, the whole cancel and wait thing was a bit of a surprise. Boost MPI is lovely but will sometimes wake you up. – Carbon May 19 '17 at 21:43