In my code I have an arbitrary number of processes exchanging some parts of their local vectors. The local vectors are vectors of pairs and for this reason I've been using an MPI derived datatype. In principle I don't know how many elements each process sends to the others and for this reason I also have to send the size of the buffer. In particular, each process exchanges data with the process with rank: myrank-1 and with the process with rank: myrank+1. In case of process 0 instead of myrank-1 it exchanges with process with rank comm_size-1. And as well in case of process comm_size-1 instead of myrank+1 it exchanges with process with rank 0. This is my code:
unsigned int size1tobesent;
size1tobesent=last.size();//Buffer size
int slpartner = (rank + 1) % p;
int rlpartner = (rank - 1 + p) % p;
unsigned int sizereceived1;
MPI_Sendrecv(&size1tobesent, 1, MPI_UNSIGNED, slpartner, 0,&sizereceived1,1,
MPI_UNSIGNED, rlpartner, 0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
Vect first1(sizereceived1);
MPI_Sendrecv(&last[0], last.size(), mytype, slpartner, 0,&first1[0],sizereceived1,
mytype, rlpartner, 0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
unsigned int size2tobesent;
size2tobesent=first.size();//Buffer size2
unsigned int sizereceived2;
MPI_Sendrecv(&size2tobesent, 1, MPI_UNSIGNED, rlpartner, 0,
&sizereceived2,1,MPI_UNSIGNED, slpartner, 0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
Vect last1(sizereceived2);
MPI_Sendrecv(&first[0], first.size(), mytype, rlpartner, 0,&last1[0],
sizereceived2 ,mytype, slpartner, 0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
Now when I run my code with 2 or 3 processes all works as expected. With more than 3 the results are unpredictable. I don't know if this is due to a particular combination of the input data or if there are some theoretical errors that I'm missing. Finally consider that this code is part of a for cycle.