0

I am trying to parallelize the following loop:

#pragma omp parallel for private(j,i,mxy) firstprivate(in,out,p)
    for(int j = 0; j < Ny; j++) {
        //        #pragma omp parallel for private(i,mxy) firstprivate(in,my,j)
        for(int i = 0; i < Nx; i++){
            mxy   = i + j*Nx;
            in[i+1] = b_2D[mxy] + I*0.0 ;
        }
        fftw_execute(p);
        for(int i = 0; i < Nx; i++){
            mxy   = i + j*Nx;
            b_2D[mxy] = cimag(out[i+1]) ;
        }
    }

I do get a small speed up, but I keep getting a different result regardless of what variables I set to private and firstprivate. I believe this is correct how I have done it, but why am I getting a different result than when I run this in series?

I have tried the following:

fftw_make_planner_thread_safe();
fftw_complex *in  = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);

#pragma omp parallel private(j,i,mxy) firstprivate(in,out)
{
    fftw_plan p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
    for( j = 0; j < N; j++)
        in[j] = 0.0;

#pragma omp for
    for( j = 0; j < Ny; j++) {
        for( i = 0; i < Nx; i++)
            in[i+1] = b_2D[i + j*Nx] + I*0.0;
        fftw_execute(p);
        for( i = 0; i < Nx; i++)
            b_2D[i + j*Nx] = cimag(out[i+1]) ;
    }

    fftw_destroy_plan(p);
}
fftw_free(in);
fftw_free(out);

This give me the error: "Segmentation fault: 11"

If I run this:

    fftw_make_planner_thread_safe();
#pragma omp parallel private(j,i,mxy)
{
    fftw_complex *in  = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    fftw_plan p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
    for( j = 0; j < N; j++)
        in[j] = 0.0;

#pragma omp for
    for( j = 0; j < Ny; j++) {
        for( i = 0; i < Nx; i++)
            in[i+1] = b_2D[i + j*Nx] + I*0.0;
        fftw_execute(p);
        for( i = 0; i < Nx; i++)
            b_2D[i + j*Nx] = cimag(out[i+1]) ;
    }

    fftw_destroy_plan(p);
    fftw_free(in);
    fftw_free(out);
}

I get this error again: "Segmentation fault: 11" but I run again and it says:

solver(9674,0x7fff74e22000) malloc: *** error for object 0x7f8d70f00410: double free
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
user906357
  • 4,575
  • 7
  • 28
  • 38
  • Why `i+1` for the indices of `in` and `out` ? – Paul R Oct 03 '16 at 16:02
  • The size of in and out are 2*Nx + 2, with the remaining entries 0. This ensures in[0]=0 – user906357 Oct 03 '16 at 16:08
  • 1
    Did you build/install a thread-safe version of FFTW ? See [this answer](http://stackoverflow.com/a/15013395/253056) which covers this and other related topics on using FFTW with OpenMP. – Paul R Oct 03 '16 at 16:13
  • I believe so as I just added the function fftw_make_planner_thread_safe() to this script and it gave no compiling errors. But it also did not give the correct result – user906357 Oct 03 '16 at 16:26
  • What's "p"? It seems suspicious that it is not modified inside the loop, but fftw_execute is invoked once per iteration, so unless something magic is going it certainly looks as if you're just doing the same FFT each time... (he says with no knowledge of FFTW :-)). – Jim Cownie Oct 04 '16 at 10:31
  • I updated the question to show what p is – user906357 Oct 05 '16 at 22:33

2 Answers2

1

You are calling FFTW with the same plan p in all threads. Since the plan includes the location of the input and output buffers (the ones supplied to the fftw_plan_dft_whatever plan constructor), all concurrent calls to fftw_execute will utilise those same buffers and not the private copies. The solution is to construct a separate plan for each thread:

#pragma omp parallel private(j,i,mxy) firstprivate(in,out)
{
    // The following OpenMP construct enforces thread-safety
    // Remove if the plan constructor is thread-safe
    #pragma omp critical (plan_ops)
    fftw_plan my_p = fftw_plan_dft_whatever(..., in, out, ...);
    // my_p now refers the private in and out arrays

    #pragma omp for
    for(int j = 0; j < Ny; j++) {
        for(int i = 0; i < Nx; i++){
            mxy   = i + j*Nx;
            in[i+1] = b_2D[mxy] + I*0.0 ;
        }
        fftw_execute(my_p);
        for(int i = 0; i < Nx; i++){
            mxy   = i + j*Nx;
            b_2D[mxy] = cimag(out[i+1]) ;
        }
    }

    // See comment above for the constructor operation
    #pragma omp critical (plan_ops)
    fftw_destroy_plan(my_p);
}
Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186
  • Thank you for the answer! This leads to "Segmentation fault: 11" – user906357 Oct 05 '16 at 19:03
  • `in` and `out` are probably pointers and not static arrays. In that case, the private copies are not really private. Allocate the private buffers in each tread before the plan constructor and free them after the plan destructor. – Hristo Iliev Oct 05 '16 at 21:15
  • That leads to another error, I'll post an edit to show what I have tried – user906357 Oct 05 '16 at 22:20
  • Check the modified answer. If it still doesn't work, then `fftw_execute` is not thread-safe and that's a problem. Also, make sure `N` is greater than or equal to `Nx+1`. – Hristo Iliev Oct 07 '16 at 08:56
  • It appears I am having more troubles because this is not working either. I will try to narrow it down and ask another question. – user906357 Oct 07 '16 at 19:21
0

The root cause should be this patch isn't backported to fftw-3.3.5 version, and I think you should merge the patch yourself. You can also refer the discussion here.

Nan Xiao
  • 16,671
  • 18
  • 103
  • 164