0

I am trying to test the effects of MPI_Send without MPI_Recv. I have the following program which I compile and run using openmpi-1.4.5 and mvapich2-1.9. I am aware that these implementations are for 2 different versions of the MPI standard, but I think MPI_Send and MPI_Recv are same across these standards:

#include <mpi.h>
#include <iostream>
#include <assert.h>

using namespace std;

MPI_Comm ping_world;
int mpi_size, mpi_rank;

void* ping(void* args)
{
    int ctr = 0;
    while(1)
    {
            char buff[6] = "PING";
            ++ctr;
            for(int i=0; i<mpi_size; ++i)
            {
                    cout << "[" << ctr << "] Rank " << mpi_rank << " sending " << buff << " to rank " << i << endl;
                    MPI_Send(buff, 6, MPI_CHAR, i, 0, ping_world);
            }
    }
}

int main(int argc, char *argv[])
{
int provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
assert(provided == MPI_THREAD_MULTIPLE);

MPI_Comm_rank (MPI_COMM_WORLD, &mpi_rank);
MPI_Comm_size (MPI_COMM_WORLD, &mpi_size);

    {
            MPI_Group orig_group;
            MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
            int ranks[mpi_size];
            for(int i=0; i<mpi_size; ++i)
                    ranks[i] = i;

            MPI_Group new_group;
            MPI_Group_incl(orig_group, mpi_size, ranks, &new_group);
            MPI_Comm_create(MPI_COMM_WORLD, new_group, &ping_world);
    }

pthread_t th_ping;
pthread_create(&th_ping, NULL, ping, (void *) NULL);

pthread_join(th_ping, NULL);

return 0;
}

With mvapich2, I always get the following output (nothing more than this). Basically, the program seems to have hanged after the 3 lines:

[1] Rank 0 sending PING to rank 0
[1] Rank 1 sending PING to rank 0
[1] Rank 1 sending PING to rank 1

With openmpi, I get the following output (unending):

[1] Rank 1 sending PING to rank 0
[1] Rank 1 sending PING to rank 1
[1] Rank 0 sending PING to rank 0
[1] Rank 0 sending PING to rank 1
[2] Rank 0 sending PING to rank 0
[2] Rank 0 sending PING to rank 1
[3] Rank 0 sending PING to rank 0
[3] Rank 0 sending PING to rank 1
[4] Rank 0 sending PING to rank 0
[4] Rank 0 sending PING to rank 1
[5] Rank 0 sending PING to rank 0
[2] Rank 1 sending PING to rank 0
[2] Rank 1 sending PING to rank 1
[3] Rank 1 sending PING to rank 0
[3] Rank 1 sending PING to rank 1
[4] Rank 1 sending PING to rank 0
[4] Rank 1 sending PING to rank 1
[5] Rank 1 sending PING to rank 0
[5] Rank 1 sending PING to rank 1
[6] Rank 1 sending PING to rank 0

Questions:

  1. Why is there such a difference?
  2. How do I achieve the behavior similar to openmpi (unending) using mvapich2?
Keval
  • 65
  • 1
  • 8

3 Answers3

1

MPI_Send can return when the buffer can be safely reused by the calling program. Nothing else is guaranteed, but there are many different implementation dependent behaviors. Different implementations can handle the buffering of messages differently. Eager protocols also allow for the transport of some short(er) messages to the receive rank without the need for a matching MPI_Recv to be posted.

If you need MPI to enforce the message being received before the blocking send returns, look at MPI_Ssend.

Stan Graves
  • 6,795
  • 2
  • 18
  • 14
  • Thanks Stan. So does that mean RDMA eager protocols are not implemented in mvapich2? Or do I need to give a compile-time / run-time switch to enable such protocols? – Keval Sep 01 '13 at 07:18
  • MVAPICH is not my primary MPI implementation. Last time I checked, InfiniBand and other RDMA interconnect were build time options for MVAPICH. So, it will take a recompile/relink of the MVAPICH source to enable an RDMA interconnect. This should be well described in the documentation. It may be possible to find pre-built versions of MVAPICH that have RDMA interconnects already enabled. – Stan Graves Sep 02 '13 at 01:43
  • 2
    Eager protocols have nothing to do with RDMA. An eager protocol is one that pushes (small) messages to their destination before the receive operation is even posted. It could be also implemented over non-RDMA-capable transports like TCP/IP. – Hristo Iliev Sep 17 '13 at 18:18
  • @HristoIliev...you are correct. I typically only use the Eager protocol with RDMA...and in my answer I incorrectly stated that as if there were an enforced connection. I edited the original answer to clarify. – Stan Graves Sep 18 '13 at 02:44
  • The difference between implementations appears with MPI_Send. You can select an specific behavior using the already proposed [MPI_Ssend](http://mpich.org/static/docs/latest/www3/MPI_Ssend.html), or others such as [MPI_Bsend](http://mpich.org/static/docs/latest/www3/MPI_Bsend.html) (buffered send), [MPI_Rsend](http://mpich.org/static/docs/latest/www3/MPI_Rsend.html) (ready send), etc. which are covered completely by MPI standard. – Jorge Bellon Jan 10 '17 at 08:12
0

It's an incorrect MPI program to send data without receiving it. The problem you're seeing is that your sends aren't matching any receives. Depending on the implementation, MPI_SEND might block until the message is actually received on the other end. In fact, all implementations that I know of will do this for sufficiently large messages (though your message of 6 bytes probably isn't hitting that threshold anywhere).

If you want to send messages without blocking, you need to use MPI_ISEND. However, even for this, you need to eventually call MPI_TEST or MPI_WAIT to be sure that the data was actually send, rather than just being buffered locally.

I'm not sure about the specifics of why MVAPICH2 hangs while Open MPI doesn't, but in the end it doesn't really matter. You need to modify your program or you're just testing cases that shouldn't really be used anyway.

Wesley Bland
  • 8,816
  • 3
  • 44
  • 59
  • Well, the case I am testing matters to me :) I want to know whether MVAPICH2 enforces that `MPI_send` returns only if it finds corresponding `MPI_Recv` (or something equivalent) at the other end or not. From what it looks like, openmpi does not enforce this condition. Of course there should be an `MPI_Recv` at the other end and I have purposely coded the program **not to** have that. – Keval Aug 30 '13 at 17:46
  • With MPI_Isend, even if you wait or test for completion, there is no guarantee that the message was received, but rather an immediate form of MPI_Send. There are synchronous immediate versions of send too: [MPI_Issend](http://mpich.org/static/docs/latest/www3/MPI_Issend.html) – Jorge Bellon Jan 10 '17 at 08:14
0

In MVAPICH2(and MPICH) implementation, the self blocking send is blocked(not buffered) until corresponding MPI_Recv is found. That is the reason it didn't hang at "Rank 1 sending PING to rank 0" It is just an implementation choice.