It seems that the runtime behaviour of fftw_mpi_plan_dft_r2c_3d is strongly affect by the first three arguments it takes. The following code is almosted copied from the fftw doc. By setting L to 512 and running 48 processes, it gives segmentation fault, but just change L to 1024 then everything will be fine. I get the same results with my Linux server with fftw-3.3.3 and my mac with fftw-3.3.4.
#include <fftw3-mpi.h>
#include <stdio.h>
int main(int argc, char **argv){
const ptrdiff_t L = 512, M = 512, N = 512;
fftw_plan plan;
double *rin;
fftw_complex *cout;
ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
MPI_Init(&argc, &argv);
fftw_mpi_init();
/* get local data size and allocate */
alloc_local = fftw_mpi_local_size_3d(L,M,N/2+1,MPI_COMM_WORLD,&local_n0, &local_0_start);
rin = fftw_alloc_real(2 * alloc_local);
cout = fftw_alloc_complex(alloc_local);
printf("allocated\n");
/* create plan for out-of-place r2c DFT */
plan = fftw_mpi_plan_dft_r2c_3d(L,M,N,rin,cout, MPI_COMM_WORLD,FFTW_MEASURE);
printf("plan made\n");
fftw_destroy_plan(plan);
MPI_Finalize();
}
This is the error on my mac
[MBP:01251] *** Process received signal ***
[MBP:01251] Signal: Segmentation fault: 11 (11)
[MBP:01251] Signal code: (0)
[MBP:01251] Failing at address: 0x0
[MBP:01251] [ 0] 0 libsystem_platform.dylib 0x00007fff8fcc8f1a _sigtramp + 26
[MBP:01251] [ 1] 0 mca_pml_ob1.so 0x0000000102eee422 append_frag_to_list + 594
[MBP:01251] *** End of error message ***