Is MPI_ACCUMULATE with MPI_REPLACE always a better option than MPI_PUT

Question

I was going through the accumulate and atomic MPI RMA calls which are introduced in MPI-3. After reading I found out that there is a MPI_REPLACE operator which can be used in MPI_Accumulate to perform a similar functionality as that of MPI_PUT. And from what I understood after reading that concurrent MPI_ACCUMULATE calls are not erroneous unilke concurrent MPI_PUT calls. Hence in my application whenever I want to update data I am using EXCLUSIVE_LOCK for MPI_PUT. But this causes severe performance degradation as even updates to different memory locations on target process happen sequentially. Hence as SHARED_LOCK is valid with MPI_ACCUMULATE is using MPI_ACCUMULATE with MPI_REPLACE inside a SHARED_LOCK always a better alternative than using MPI_PUT with a EXCLUSIVE_LOCK? Or am I misunderstanding something? Also simillary on a minor note is MPI_GET_ACCUMULATE with MPI_NO_OP always better than MPI_GET?

So basically my question is will removing all MPI_PUT calls which are currently synced by EXLUSIVE LOCK and replacing those with a MPI_ACCUMULATE with MPI_REPLACE synced by SHARED_LOCK a valid and better alternative ... as it removes the need for getting an EXCLUSIVE LOCK on the whole target process window.

Jeff Hammond · Answer 1 · 2023-04-26T07:30:27.117

1

MPI_ACCUMULATE with MPI_REPLACE is an atomic put and is neither better nor worse in general but they are almost certainly better than MPI_PUT using exclusive locks when one requires element-wise atomicity.

The recommended model for MPI-3 RMA is to use MPI_WIN_LOCK_ALL for the lifetime of the window, and use element-wise RMA operations or some form of mutual exclusive (mentioned in https://stackoverflow.com/a/75927929/2189128) for anything else.

Use MPI_WIN_FLUSH(_LOCAL)(_ALL) to achieve the appropriate synchronization without terminating the epoch. Use _LOCAL versions if you only care about reusing the buffer, or if the RMA operation has round-trip semantics (e.g. get, get_accumulate, fetch_and_op, compare_and_swap). Use _ALL versions to complete at all targets in the window, as opposed to just one.

edited Apr 26 '23 at 07:30

answered Apr 04 '23 at 09:53

Jeff Hammond

5,374
3
28
45

But to enusre that the RMA call completes before I proceed to a particular next instruction ,I need to unlock the window right? So will having just the MPI_WIN_LOCK_ALL for the lifetime of the window be suitable? – Yash Apr 25 '23 at 06:18
i updated the answer to address this – Jeff Hammond Apr 26 '23 at 07:29

Is MPI_ACCUMULATE with MPI_REPLACE always a better option than MPI_PUT

1 Answers1