1

I was going through the accumulate and atomic MPI RMA calls which are introduced in MPI-3. After reading I found out that there is a MPI_REPLACE operator which can be used in MPI_Accumulate to perform a similar functionality as that of MPI_PUT. And from what I understood after reading that concurrent MPI_ACCUMULATE calls are not erroneous unilke concurrent MPI_PUT calls. Hence in my application whenever I want to update data I am using EXCLUSIVE_LOCK for MPI_PUT. But this causes severe performance degradation as even updates to different memory locations on target process happen sequentially. Hence as SHARED_LOCK is valid with MPI_ACCUMULATE is using MPI_ACCUMULATE with MPI_REPLACE inside a SHARED_LOCK always a better alternative than using MPI_PUT with a EXCLUSIVE_LOCK? Or am I misunderstanding something? Also simillary on a minor note is MPI_GET_ACCUMULATE with MPI_NO_OP always better than MPI_GET?

So basically my question is will removing all MPI_PUT calls which are currently synced by EXLUSIVE LOCK and replacing those with a MPI_ACCUMULATE with MPI_REPLACE synced by SHARED_LOCK a valid and better alternative ... as it removes the need for getting an EXCLUSIVE LOCK on the whole target process window.

Yash
  • 21
  • 2

1 Answers1

1

MPI_ACCUMULATE with MPI_REPLACE is an atomic put and is neither better nor worse in general but they are almost certainly better than MPI_PUT using exclusive locks when one requires element-wise atomicity.

The recommended model for MPI-3 RMA is to use MPI_WIN_LOCK_ALL for the lifetime of the window, and use element-wise RMA operations or some form of mutual exclusive (mentioned in https://stackoverflow.com/a/75927929/2189128) for anything else.

Use MPI_WIN_FLUSH(_LOCAL)(_ALL) to achieve the appropriate synchronization without terminating the epoch. Use _LOCAL versions if you only care about reusing the buffer, or if the RMA operation has round-trip semantics (e.g. get, get_accumulate, fetch_and_op, compare_and_swap). Use _ALL versions to complete at all targets in the window, as opposed to just one.

Jeff Hammond
  • 5,374
  • 3
  • 28
  • 45
  • But to enusre that the RMA call completes before I proceed to a particular next instruction ,I need to unlock the window right? So will having just the MPI_WIN_LOCK_ALL for the lifetime of the window be suitable? – Yash Apr 25 '23 at 06:18
  • i updated the answer to address this – Jeff Hammond Apr 26 '23 at 07:29