I was going through the accumulate and atomic MPI RMA calls which are introduced in MPI-3. After reading I found out that there is a MPI_REPLACE operator which can be used in MPI_Accumulate to perform a similar functionality as that of MPI_PUT. And from what I understood after reading that concurrent MPI_ACCUMULATE calls are not erroneous unilke concurrent MPI_PUT calls. Hence in my application whenever I want to update data I am using EXCLUSIVE_LOCK for MPI_PUT. But this causes severe performance degradation as even updates to different memory locations on target process happen sequentially. Hence as SHARED_LOCK is valid with MPI_ACCUMULATE is using MPI_ACCUMULATE with MPI_REPLACE inside a SHARED_LOCK always a better alternative than using MPI_PUT with a EXCLUSIVE_LOCK? Or am I misunderstanding something? Also simillary on a minor note is MPI_GET_ACCUMULATE with MPI_NO_OP always better than MPI_GET?
So basically my question is will removing all MPI_PUT calls which are currently synced by EXLUSIVE LOCK and replacing those with a MPI_ACCUMULATE with MPI_REPLACE synced by SHARED_LOCK a valid and better alternative ... as it removes the need for getting an EXCLUSIVE LOCK on the whole target process window.