1

N is 4, so is N_glob. It happens to be of the same size. p is 4.

Here is a small portion of the code:

float **global_grid;
float **gridPtr; 
lengthSubN = N/pSqrt;
subN = lengthSubN + 2;
grid = allocate2D(grid, subN, subN);
..
MPI_Type_contiguous(lengthSubN, MPI_FLOAT, &rowType);
MPI_Type_commit(&rowType);
..
gridPtr = grid;
..
MPI_Barrier(MPI_COMM_WORLD);
if(id == 0) {
    global_grid = allocate2D(global_grid, N_glob, N_glob);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Gather(&(gridPtr[0][0]), 1, rowType,
           &(global_grid[0][0]), 1, rowType, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
if(id == 0)
    print(global_grid, N_glob, N_glob);

where I have p submatrices and I am trying to gather them all in the root process, where the global matrix waits for them. However, it will just throw an error, any ideas?

I am receiving a seg fault:

BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES PID 29058 RUNNING AT linux16 EXIT CODE: 139 YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)


EDIT:

I found this question MPI_Gather segmentation fault and I initialized global_grid to NULL, but no luck. However, if I do:

//if(id == 0) {
    global_grid = allocate2D(global_grid, N_glob, N_glob);
//}

then everything works. But shouldn't the global matrix live only in the root process?


EDIT_2:

IF I do:

if(id == 0) {
    global_grid = allocate2D(global_grid, N_glob, N_glob);
} else {
    global_grid = NULL;
}

then it will crash here:

MPI_Gather(&gridPtr[0][0], 1, rowType,
                global_grid[0], 1, rowType, 0, MPI_COMM_WORLD);
Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • 2
    Have you tried using a debugger? – EkcenierK Dec 30 '15 at 20:31
  • Hmm @KLibby I am updating. – gsamaras Dec 30 '15 at 20:34
  • Where does it _hung_? At what line of code? Do you mean the function never returns? – ryyker Dec 30 '15 at 20:34
  • Yes @ryyker, the function does not return. – gsamaras Dec 30 '15 at 20:36
  • @Gernot1976 I wish you were right, but no, updated the question, tried to save some space there, sorry :) – gsamaras Dec 30 '15 at 21:06
  • Did you initialized `gridPtr`? – Martin Zabel Dec 30 '15 at 21:26
  • @MartinZabel every `gridPtr` has the expected values on their respective 4x4 submatrix. :/ – gsamaras Dec 30 '15 at 21:30
  • 1
    There is no 2D array in your code. – too honest for this site Dec 30 '15 at 21:30
  • @Olaf when I am saying a 2D array, I am referring to the double pointers that are later being malloc'ed, aren't I? The pointers are the very first lines of my code. – gsamaras Dec 30 '15 at 21:32
  • 1
    A pointer is not an array and vice versa. Don't be misslead by indexing operator which is overloaded for pointers and arrays. There are different semantics. A pointer to a true 2D array (i.e. a matrix) would look like `int (*ip)[4]` (pointer to the first row of the array), or `int (*p2da)[4][5]` (true pointer to 2D array, but less intuitive to use). – too honest for this site Dec 30 '15 at 21:33
  • I see @Olaf, but still it's not clear to me how I should fix the code.. :/ Any insight please? – gsamaras Dec 30 '15 at 21:34
  • Just to be sure. You initialized `grid` but not `gridPtr` in line 5. – Martin Zabel Dec 30 '15 at 21:35
  • `gridPtr` is a pointer to the grid, thus its initialization is this: `gridPtr = grid;`. I also updated my question, thanks @MartinZabel! – gsamaras Dec 30 '15 at 21:37
  • TL;DR for today - sorry. But segfaults are typical for problems with (multiple) indirection. I'd check in that direction. – too honest for this site Dec 30 '15 at 21:37
  • I am trying @Olaf, thanks for the help so far! – gsamaras Dec 30 '15 at 21:38
  • A downvvote for what? I keep improving this question SO hard! The post is rather long and you say that I should add more code?! I have included all the relevant parts!., if I have forgotten any please let me know! – gsamaras Dec 30 '15 at 21:41
  • I didn't downvoted it. But, the relationship between `N` and `N_glob` is missing. Also note, that you only gather only one row from each rank. Is this intended? – Martin Zabel Dec 30 '15 at 21:50
  • @MartinZabel thanks, I updated! Well the goal is to ultimately gather all the submatrices in the global matrix, but I wanted to start from somewhere! – gsamaras Dec 30 '15 at 21:53
  • 1
    You should remove EDIT_2, _3 and _4 because the pointer arithmetic is broken. `&(gridPtr[0])` as well as `&(global_grid[0])` return a pointer to the first row-pointer allocated at `A = malloc(M * sizeof (float*));` within `allocate2d`. That is not the data space. You should try `gridPtr[0]` and `global_grid[0]` instead, because these actually point to the first row in the data space. – Martin Zabel Dec 30 '15 at 22:22
  • @MartinZabel good idea, but it didn't work either, see my edit! – gsamaras Dec 31 '15 at 09:23
  • `global_grid` is already a pointer. Just do `if(id==0) { global_grid = allocate2d(...); } else {global_grid = NULL;}` – Martin Zabel Dec 31 '15 at 09:59
  • @gsamaras: Imagine someone in the future is having a similar problem and search turns up this question. Do you think they will have *any* hope of understanding it? This is a [SO] question, not a changelog or you personal interactive debugging session. Could you do something about this please? – talonmies Dec 31 '15 at 11:00
  • @talonmies you are right. I updated my question, hope it's better now. If you have any suggestions please let me know. – gsamaras Dec 31 '15 at 11:06

1 Answers1

4

The variable global_grid is not initialized in ranks other than rank 0. Thus, this equation

&(global_grid[0][0])

or this one:

global_grid[0]

leads to a segmentation fault, because it tries to access the first element of global_grid.

Just make two calls to MPI_Gather, one for rank 0 and one for the other ones:

if(id == 0) {
    MPI_Gather(gridPtr[0], 1, rowType, global_grid[0], 1, rowType, 0, MPI_COMM_WORLD);
} else {
    MPI_Gather(gridPtr[0], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD);
}
Martin Zabel
  • 3,589
  • 3
  • 19
  • 34
  • You are right Martin thanks! So how should my `MPI_Gather()` look like? – gsamaras Dec 31 '15 at 10:05
  • @gsamaras Added call to `MPI_Gather`. – Martin Zabel Dec 31 '15 at 10:08
  • Martin thanks, that did the trick. Now I have to actually send the data I need to the global matrix, thus I need some explanation. Only the first MPI_Gather does work, while the one in the else case does nothing? After experimenting a bit, it seems like not. I also did a follow-up question here: http://stackoverflow.com/questions/34545278/mpi-gather-the-central-elements-into-a-global-matrix – gsamaras Dec 31 '15 at 10:51