2

In MPI (MPICH) I am trying to use windows. I have a 3D grid topology and additional communicator i_comm.

  MPI_Comm cartcomm;
  int periods[3]={1,1,1}, reorder=0, coords[3];
  int dims[3]={mesh, mesh, mesh};  //mesh is size of each dimention
  MPI_Dims_create(size, 3, dims);
  MPI_Cart_create(MPI_COMM_WORLD, 3, dims, periods,reorder, &cartcomm);
  MPI_Cart_coords(cartcomm, my_rank, 3, coords);

  MPI_Comm i_comm;
  int i_remain_dims[3] = {false, true, false};
  MPI_Cart_sub(cartcomm, i_remain_dims, &i_comm);
  int i_rank;
  MPI_Comm_rank(i_comm, &i_rank);

  MPI_Win win_PB;

  int * PA = (int *) malloc(r*r*sizeof(int)); //r is input size 
  int * PB = (int *) malloc(r*r*sizeof(int));

  /* arrays are initialized*/

Then I create window and afterwards try to use get function

  if(i_rank == 0){
      MPI_Win_create(PB, r*r*sizeof(int), sizeof(int), MPI_INFO_NULL, i_comm, &win_PB);
    }
  else{
      MPI_Win_create(NULL, 0, 1, MPI_INFO_NULL, i_comm, &win_PB);
    }

  MPI_Win_fence(0, win_PB);
  if(i_rank != 0){
      MPI_Get(PB, r*r*sizeof(int), MPI_INT, 0, 0, r*r*sizeof(int), MPI_INT, win_PB);
  }
  MPI_Win_fence(0, win_PB);

With this code I get long output of errors:

[ana:24006] *** Process received signal ***
[ana:24006] Signal: Segmentation fault (11)
[ana:24006] Signal code: Address not mapped (1)
[ana:24006] Failing at address: 0xa8

Also, without using MPI_Win_fence, I get error with get function: MPI_ERR_RMA_SYNC: error executing rma sync. Which I am not sure is normal.

What I observed is that if I declare arrays in a reverse order then it works fine:

int * PB = (int *) malloc(r*r*sizeof(int)); 
int * PA = (int *) malloc(r*r*sizeof(int));

The problem is that I will need to create another communicator and another window for PA buffer, so just switching order of lines does not help at the end.

I would highly appreciate any help to figure out what I am doing wrong.

Ana Khorguani
  • 896
  • 4
  • 18
  • What if you `MPI_Get(PB, r*r, MPI_INT, 0, 0, r*r, MPI_INT, win_PB);` ? – Gilles Gouaillardet Dec 27 '19 at 23:40
  • @GillesGouaillardet Thanks a lot, that worked like a charm. I feel stupid that I did not notice that I was not giving correct size of the buffer. – Ana Khorguani Dec 28 '19 at 08:33
  • no problem, everyone needs an extra pair of eyes once in a while... – Gilles Gouaillardet Dec 28 '19 at 08:35
  • @GillesGouaillardet True :) Also is there any chance you could help with understanding why I get errors when I don't put get function between fences? I get the point that it's necessary to be sure that I am reading what what written, but I did not realized before that I would have gotten an error otherwise – Ana Khorguani Dec 28 '19 at 08:40
  • basically, there are two ways of doing RMA: active and passive targets. both have their own semantics which is maybe not one would naively expect. I am afraid this is pretty much all I know. – Gilles Gouaillardet Dec 28 '19 at 09:08
  • @GillesGouaillardet ok, I think I get the point. I took a look and in all related questions either they use mpi_lock function surrounding get function or fence I guess. well, thank you again very much for helping – Ana Khorguani Dec 28 '19 at 09:13

0 Answers0