Consistency of MPI_Fetch_and_op

Question

I am trying to understand the MPI-Function `MPI_Fetch_and_op() through a small example and ran into a strange behaviour I would like to understand.

In the example the process with rank 0 is waiting till the processes 1..4 have each incremented the value of result by one before carrying on.

With the default value 0 for assert used in the function MPI_Win_lock_all() I sometimes (1 out of 10) get an infinite loop, that is updating the value of result[0] in the MASTER to the value of 3. The terminal output looks like the following code snippet:

result: 3
result: 3
result: 3
...

According to the documentation the function MPI_Fetch_and_op is atomic.

This operations is atomic with respect to other "accumulate" operations.

First Question: Why is it not updating the value of result[0] to 4?

If I change the value of assert to MPI_MODE_NOCHECK it seems to work

Second Question: Why is it working with MPI_MODE_NOCHECK

According to the documentation I thought this means the mutual exclusion has to be organized in a different way. Can someone explain the passage from the documentation of MPI_Win_lock_all()?

MPI_MODE_NOCHECK

No other process holds, or will attempt to acquire a conflicting lock, while the caller holds the window lock. This is useful when mutual exclusion is achieved by other means, but the coherence operations that may be attached to the lock and unlock calls are still required.

Thanks in advance!

Example program:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

#define MASTER 0

int main(int argc, char *argv[])
{
  MPI_Init(&argc, &argv);
  MPI_Comm comm = MPI_COMM_WORLD;
  int r, p;
  MPI_Comm_rank(comm, &r);
  MPI_Comm_size(comm, &p);
  printf("Hello from %d\n", r);
  int result[1] = {0};
  //int assert = MPI_MODE_NOCHECK;
  int assert = 0;
  int one = 1;
  MPI_Win win_res;
  MPI_Win_allocate(1 * sizeof(MPI_INT), sizeof(MPI_INT), MPI_INFO_NULL, comm, &result[0], &win_res);
  MPI_Win_lock_all(assert, win_res);
  if (r == MASTER) {
    result[0] = 0;
    do{
      MPI_Fetch_and_op(&result, &result , MPI_INT, r, 0, MPI_NO_OP, win_res);  
      printf("result: %d\n", result[0]);
    } while(result[0] != 4);
    printf("Master is done!\n");
  } else {
    MPI_Fetch_and_op(&one, &result, MPI_INT, 0, 0, MPI_SUM, win_res);
  }
  MPI_Win_unlock_all(win_res);
  MPI_Win_free(&win_res);
  MPI_Finalize();
  return 0;
}

Compiled with the following Makefile:

MPICC = mpicc
CFLAGS = -g -std=c99 -Wall -Wpedantic -Wextra

all: fetch_and

fetch_and: main.c
    $(MPICC) $(CFLAGS) -o $@ main.c

clean:
    rm fetch_and

run: all
    mpirun -np 5 ./fetch_and

Your master loop is within the lock/unlock window. Therefore, if rank `MASTER` manages to obtain the lock before all the other ranks were able to lock, increment, and unlock the window, it will never release the lock as the value of `result[0]` will not change. — Hristo Iliev, May 23 '17 at 16:10
Thanks for your hint. But as far as I know MPI_Win_lock is not really a lock. At least 4 processes are interacting in the window, not just first that reaches the function. 0 is reading an three others are updating. The question remains for me. Why isn't the fourth process updating the result. — nando, May 23 '17 at 16:42
And it shouldn't be necessary to close the window inside. In other programs I had it the same way and worked inside with MPI_Win_flush. This shouldn't be appropriate here because MPI_Fetch_and_op is atomic. If I'm wrong here I appreciate any hint. — nando, May 23 '17 at 16:45
My bad. `MPI_Win_lock_all` places a shared lock, not an exclusive one. What happens if you remove the assignment `result[0] = 0;` from the code for the master rank? — Hristo Iliev, May 23 '17 at 17:33
Thanks. I will try when I get home tomorrow and let you know. — nando, May 23 '17 at 17:48
Note that `MPI_Win_allocate` returns the base address of the allocated memory in its 5th argument. You are passing the address of the array `result[]`. On a 64-bit system this will result in a write 4 bytes pass the end of the array and eventually messing up the stack. — Hristo Iliev, May 23 '17 at 17:50
Thanks for the correction. I changed it. But I still get the same results with assert 0. — nando, May 24 '17 at 13:56
I'm willing to bet that a flush will solve it. That has nothing to do with atomicity: atomicity guarantees that no one else is altering the remote variable between your fetch and op. It does not guarantee consistency of your local data the remote data. — Victor Eijkhout, Feb 22 '19 at 02:36

Victor Eijkhout · Answer 1 · 2020-07-30T14:59:57.550

Your code works for me, unchanged. But that may be coincidence. There are many problems with your code. Let me point out what I see:

You hard-coded the number of processes in the test result[0] != 4
You hard-coded the master value into MPI_Fetch_and_op(&one, &result, MPI_INT, 0
Passing the same address as update and result seems dangerous to me: MPI_Fetch_and_op(&result, &result
And my compiler complains about the first parameter since it is in effect an int** (actually int (*)[1])
I'm not sure why you don't get the same complaint on the second parameter,
....but I'm not happy about that second parameter anyway, since the fetch operation writes in memory that you designated to be the window buffer. I guess the lack of coherence here saves you.
You initialize the window with result[0] = 0; but I don't think that is coherent with the window so again, you may just be lucky.
I would think that MPI_Win_allocate(1 * sizeof(MPI_INT), sizeof(MPI_INT), MPI_INFO_NULL, comm, &result[0] would also be some sort of memory corruption since result is an output here, but it is a statically allocated array.
Similarly, Win_free tries to deallocate the memory buffer, but that was, as already remarked, a static buffer, so again: memory corruption.
Your use of Win_lock_all is not appropriate: it means that one process locks the window on all targets. Without any competing locks!! You are locking the window on only one process, but from all possible origins. I'd use an ordinary lock.
Finally, RMA calls are non-blocking. Normally, consistency is made by a Win_fence or Win_unlock. But because you are using a long-lived lock, you need to follow the Fetch_and_op by a MPI_Win_flush_local.

Ok, so that's a dozen cases of, eh, less than ideal programming. Still, in my set up it works. (Sometimes. Sometimes it also hangs.) So you may want to clean up your code a little. Your logic is correct, but your actual implementation not.

Consistency of MPI_Fetch_and_op

1 Answers1