For the problem I'd like to discuss, let's take MPI_Barrier
as an example. The MPI3 standard states
If comm is an intracommunicator, MPI_BARRIER blocks the caller until all group members have called it. The call returns at any process only after all group members have entered the call.
So I was wondering - same essentially applies to all collective operations in general - how this assertion has to be interpreted in cases where some processes of the communication context just exited (successfully) prior to execution of MPI_Barrier
: For example, let's assume we have two processes A and B and use MPI_COMM_WORLD
as communicator and argument comm
to MPI_Barrier
. After A and B call MPI_Init
, if B immediately calls MPI_Finalize
and exits, and if only A calls MPI_Barrier
before calling MPI_Finalize
, is A blocked for eternity? Or is the set of "all group members" defined as the set of all original group members which have not exited, yet? I'm pretty sure A is blocked forever, but maybe the MPI standard has more to say about this?
REMARK: This is not a question about the synchronizing properties of MPI_Barrier
, the reference to MPI_Barrier
is merely meant to be a concrete example. It is a question about MPI program correctness if collective operations are performed. See the comments.