0

Backgroup: rank 0 send message to rank 1, after rank 1 completes its work it returns messages to rank 0

actually I run a thread for sending message and the other one for receiving in rank 0 like this:

int tag = 1;
void* thread_send(void* argc)
{
   ...;
    while(1)
   {
     if(tag == 1) 
     {
        MPI_Send(...,1,TAG_SEND,...);//send something to slave
        tag = 0;
     }
   }
   ...
}

void* thread_receive(void* argc)
{
    while(1)
    {
      MPI_Recv(...,0,TAG_RECV,...); //ready for receiving from slave
      tag = 1;
    }
}

in rank 1 I run a thread like this:

void* slave(void* argc)
{   
    ...;
    while(1)
    {
        MPI_Probe(0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
        switch(status.MPI_TAG){
        case TAG_SEND:
        MPI_Recv(..,0,TAG_SEND,..);
        break;
        }
        MPI_Send(...,0,MPI_RECV,...); //notify rank 0 slave has done his work
    }
}

then I got an error like this:

    [comp01-mpi.gpu01.cis.k.hosei.ac.jp][[54135,1],0]
    [btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] 
    received unexpected      process identifier [[16641,0],301989888]

In fact there are several interfaces for one machine, I know it might to be a problem, so I assign the parameter --mca btl_tcp_if_include eth0 --mca oob_tcp_if_include eth0 to avoid network traffic.

Have I done something wrong? I will appreciate any suggestion you give me, thanks.

Thanks to @HristoIliev, I checked the Open MPI like this:

    MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provide_level);
    if(provide_level < MPI_THREAD_MULTIPLE){
        printf("Error: the MPI library doesn't provide the required thread level\n");
        MPI_Abort(MPI_COMM_WORLD,0);
    }

and I got the error:

Error: the MPI library doesn't provide the required thread level

that means I CAN NOT use multiple threads, so what else can I do?

Now I am using the non-blocing sends(Isend) and receives(Irecv), the code is like this: send thread:

int tag = 1;
    void* thread_send(void* argc)
{

   ...;
    while(1)
   {
     while(1)
     {
          MPI_Irecv(&tag,MPI_INT,1,MSG_TAG,MPI_COMM_WORLD,&request);
          if(tag == 1) break;
          printf("tag is %d\n",tag);
          MPI_Wait(&request,&status);
     }

        MPI_Send(...,1,MSG_SEND,...);//send something to slave
        tag = 0;

   }
   ...
}

receive thread:

void* slave(void* argc)
    {   
        ...;
        while(1)
        {
            MPI_Probe(0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
            switch(status.MPI_TAG){
            case TAG_SEND:
            MPI_Recv(..,0,MSG_Send,..);
            break;
            }
            int tag = 1;
            MPI_Isend(&tag,1,MPI_INT,0,MSG_TAG,MPI_COMM_WORLD,&request); //notify rank 0 slave has done his work
           MPI_Wait(&request,&status);
           printf("slave is idle now \n");
        }
    }

and it printed like this:

tag is 0
slave is idle now

and hang here

Jerry
  • 121
  • 1
  • 2
  • 10
  • 1
    Do you initialise MPI properly using `MPI_Init_thread` and thread level of `MPI_THREAD_MULTIPLE`? Is your Open MPI compiled with support for that thread level? – Hristo Iliev Mar 27 '14 at 16:05
  • @HristoIliev Oh, you always hit me, I never thought that MPI installed in the cluster doesn't support multiple thread level, but the fact is it does not support it. It's so disappointing. Is it possible to receive messages at the same time sending messages? – Jerry Mar 28 '14 at 06:52
  • In code you should use non-blocking send and/or receives and those can usually be progressed together. But in the end it all depends on the network hardware if it is full duplex or not. Most modern networks are. – Hristo Iliev Mar 28 '14 at 08:01
  • @HristoIliev The non-blocking send and receives should have worked, but they didn't. I have update my code in the question, would you like to look it over? – Jerry Mar 31 '14 at 02:26

1 Answers1

0

I have solved the problem by changing the Irecv() funciton's location, like following:

send thread:

int tag = 1;
    void* thread_send(void* argc)
{

   ...;
    while(1)
   {
     while(1)
     {

          if(tag == 1) break;
          printf("tag is %d\n",tag);
          MPI_Irecv(&tag,MPI_INT,1,MSG_TAG,MPI_COMM_WORLD,&request);
          MPI_Wait(&request,&status);
     }

        MPI_Send(...,1,MSG_SEND,...);//send something to slave
        tag = 0;

   }
   ...
}.

In conclusion, to send and receive messages at the same time, you can use multiple thread if your MPI supports multiple-thread mode, you can check it when you init your MPI program like this:

MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provide_level);
    if(provide_level < MPI_THREAD_MULTIPLE){
        printf("Error: the MPI library doesn't provide the required thread level\n");
        MPI_Abort(MPI_COMM_WORLD,0);
    }

Or if your MPI doesn't support multiple thread mode, you may use non-blocking communication.

Jerry
  • 121
  • 1
  • 2
  • 10