0
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
//Please run this program with 4 processes
int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);
    // Check that 4 MPI processes are used
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    if (comm_size != 4)
    {
        printf("This application is meant to be run with 4 MPI processes, not %d.\n", comm_size);
        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
    }
    // Get my rank in the global communicator
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Determine the colour and key based on whether my rank is even.
    int colour;
    int key;
    if (my_rank % 2 == 0)
    {
        colour = 0;
        key = my_rank;
    }
    else
    {
        colour = 1;
        key = comm_size - my_rank;
    }
    // Split the global communicator
    MPI_Comm new_comm;
    MPI_Comm_split(MPI_COMM_WORLD, colour, key, &new_comm);
    // Get my rank in the new communicator
    int my_new_comm_rank;
    MPI_Comm_rank(new_comm, &my_new_comm_rank);
    // Print my new rank and new communicator
    printf("I am process %d and I belong to %x\n", my_rank,new_comm);
    MPI_Finalize();
    return EXIT_SUCCESS;
}

The code above is supposed to divide 4 processes into 2 different subcommunicators, with processes 0 and 2 in one, and processes 1 and 3 in the other. However the output of this program is :

I am process 3 and I belong to 84000000
I am process 1 and I belong to 84000000
I am process 2 and I belong to 84000000
I am process 0 and I belong to 84000000

What doesn't make any sense is that they all belong to the same subcommunicator(84000000). It seems it fails to split them into different subcommunicators. By the way, I run this in Windows OS with MSMPI.

  • 2
    A communicator is an opaque handler you should not print. Moreover, this handler is local to each rank, so you cannot compare communicators between ranks based on the value of this opaque handler. What if you print the size and rank in the **new** communicator? (you did **not** print `my_new_comm_rank`)? does the output make more sense now? – Gilles Gouaillardet Apr 23 '23 at 12:09
  • @GillesGouaillardet perhaps you are right –  Apr 23 '23 at 14:19
  • maybe only, it happens sometimes... – Gilles Gouaillardet Apr 23 '23 at 14:48

1 Answers1

2

You are thinking in shared memory terms. MPI uses distributed memory: each process has its own address space. Thus, address 84000000 on one process is a completely different object from the same address on another process. It's pure coicidence that they have the same address.

So you might wonder, how can I test if these subcommunicators are indeed the same? And the answer is: you can't. If two processes are in different communicators, they can't even see the other one. Think about it: how would you have a handle to a communicator that you are not in?

Victor Eijkhout
  • 5,088
  • 2
  • 22
  • 23