0

I am working on an MPI code in which I am trying to use one-sided communication (RMA). I am creating a window using MPI_Win_create. The code works correctly and gives correct results but still 1 processor exits with a segmentation fault and I am unable to figure out why.

This code I am working on is quite big, but I am able to reproduce the same error in the following code too.

#include <stdio.h>
#include <mpi.h>
#include <boost/mpi.hpp>
#include <boost/mpi/collectives.hpp>
#include <boost/serialization/serialization.hpp>
#include <boost/serialization/vector.hpp>


int main(int argc, char *argv[])
{
    boost::mpi::environment env(argc, argv);
    boost::mpi::communicator world;
    printf("init\n");
    int * arr = new int[100];
    for(int i=0;i<100 ; i++)
    {
        arr[i]=i+world.rank()*100;
    }
    MPI_Win win;
    printf("create window\n");
    MPI_Win_create(arr, MPI_Aint(100*sizeof(int)), sizeof(int),MPI_INFO_NULL, MPI_COMM_WORLD, &win);
    MPI_Win_free(&win);
    printf("done\n");
    delete [] arr;

    return 0;

}

All the print statements are printed correctly by each of the processor. But still one process exits with a segfault. This is error output that I got,

[gpu023:32302:0:32318] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2b194396e760)
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node gpu023 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Can anyone please help me out with this? I have been struggling to find what is the cause behind this. I am running this code with 64 processors on 2 nodes. The code runs correctly without any errors if I remove the 3 MPI_Win , MPI_Win_create and MPI_Win_free statements.

Yash
  • 21
  • 2
  • What if you add `MPI_Win_fence(0, window);` inbetween the creation and freeing of window? I personally cannot reproduce your segfault but it seems to be a sychronisation bug – Post Self Feb 02 '23 at 12:10
  • 1
    which Open MPI version are you running and how many nodes are you using? if more than one node, what is your interconnect? if Infiniband, are you using UCX and if yes, which version? – Gilles Gouaillardet Feb 02 '23 at 12:32
  • @PostSelf I still get the same segfault if i `add MPI_WIN_FENCE(0,win)` – Yash Feb 02 '23 at 17:47
  • @GillesGouaillardet Open MPI version is 3.1.6, I am running it on 2 nodes. UCX version is 1.8.0 – Yash Feb 02 '23 at 17:53
  • 1
    Open MPI 3.1.6 is officially retired, so I suggest you try a supported version such as `4.1.4` or `4.0.7` – Gilles Gouaillardet Feb 03 '23 at 05:28
  • Thanks @GillesGouaillardet, upgrading to 4.1.1 solved the problem – Yash Feb 14 '23 at 18:25

0 Answers0