MPI collective operations and process lifetime (C/C++)

Question

For the problem I'd like to discuss, let's take MPI_Barrier as an example. The MPI3 standard states

If comm is an intracommunicator, MPI_BARRIER blocks the caller until all group members have called it. The call returns at any process only after all group members have entered the call.

So I was wondering - same essentially applies to all collective operations in general - how this assertion has to be interpreted in cases where some processes of the communication context just exited (successfully) prior to execution of MPI_Barrier: For example, let's assume we have two processes A and B and use MPI_COMM_WORLD as communicator and argument comm to MPI_Barrier. After A and B call MPI_Init, if B immediately calls MPI_Finalize and exits, and if only A calls MPI_Barrier before calling MPI_Finalize, is A blocked for eternity? Or is the set of "all group members" defined as the set of all original group members which have not exited, yet? I'm pretty sure A is blocked forever, but maybe the MPI standard has more to say about this?

REMARK: This is not a question about the synchronizing properties of MPI_Barrier, the reference to MPI_Barrier is merely meant to be a concrete example. It is a question about MPI program correctness if collective operations are performed. See the comments.

_"same essentially applies to all collective operations in general"_ - this is a widespread misconception (even the [MPI Tutorial](http://mpitutorial.com/tutorials/mpi-broadcast-and-collective-communication/) is wrong on it). `MPI_BARRIER` is **the only** synchronising MPI collective operation. All the other ones are allowed to exit as early as once the rank's participation is no longer needed. It also means that a given rank may enter and then exit certain implementation-dependent collective calls even before all other ranks have entered the call. — Hristo Iliev, Mar 02 '16 at 20:10
@HristoIliev: but the synchronizing aspect of MPI_Barrier was not the point of my question. I'm aware of what you are saying, and it is valid to point it out. But my question basically was about the notion of "all group members" (participating in a collective operation), and what this entails. And in this sense, the question basically applies to all collectives. So the point is not so much about wondering whether process A hangs cause B does not call MPI_Barrier (this IS specific to its synchronizing property), but whether the collective call is executed by the correct "group of all members". — sperber, Mar 02 '16 at 20:16
Then delete the second sentence of the citation to prevent confusion. Otherwise, a consequence of what I said is that a rank might go happily through a collective call even if some members of the communicator's group are gone if that rank does not have to communicate with those members. It is only `MPI_BARRIER` that would **reliably** block or result in an error. — Hristo Iliev, Mar 02 '16 at 20:18
Also, note that communicator groups are immutable once created, no matter what happens with the ranks afterwards. If you use `MPI_COMM_WORLD`, it is expected that **all** initially started ranks should participate (at least until fault tolerance becomes an integral part of the MPI standard) — Hristo Iliev, Mar 02 '16 at 20:26

Zulan · Accepted Answer · 2016-03-02T19:50:03.510

If B exits right at program start and only A calls MPI_Barrier, is A blocked for eternity?

Basically yes. But actually, you are not allowed to do that.

Simply speaking, you must call MPI_Finalize on all processes before exiting. And MPI_Finalize acts like a collective (on MPI_COMM_WORLD), so it usually does not complete before every process calls MPI_Finalize. So in your example, process B didn't exit (at least not correctly).

But I guess the MPI 3.1 standard at 8.7 explains it more clearly:

MPI_Finalize [...] This routine cleans up all MPI state. If an MPI program terminates normally (i.e., not due to a call to MPI_ABORT or an unrecoverable error) then each process must call MPI_FINALIZE before it exits. Before an MPI process invokes MPI_FINALIZE, the process must perform all MPI calls needed to complete its involvement in MPI communications: It must locally complete all MPI operations that it initiated and must execute matching calls needed to complete MPI communications initiated by other processes.

Note how the last sentence also requires you to complete the barrier in your question.

The standard says, your program is not correct. In practice it will most likely deadlock/hang.

Thanks! I implicitly assumed that B calls MPI_Finalize. I will edit my question accordingly. So your answer would in essence be that the program violates MPI's demand that all MPI processes (in the same communication context) execute the same sequence of collective operations (MPI_Finalize being one of them)?! — sperber, Mar 02 '16 at 19:48
Yes. And if the standard says, your program is not correct - anything could happen. In practice it will most likely deadlock/hang. — Zulan, Mar 02 '16 at 19:53

MPI collective operations and process lifetime (C/C++)

1 Answers1