C2R FFT on CUDA (CUFFT) produces different results than FFTW

Question

I'm working over transforming some code from using FFTW library to CUFFT (CPU computing to GPU computing). I need to transform a matrix of forces, make some math on it and transform it back. Operation in FFTW looks like it:

fftw_real u0[DIM * 2*(DIM/2+1)], v0[DIM * 2*(DIM/2+1)];

static rfftwnd_plan plan_rc, plan_cr;

void init_FFT(int n) {
  plan_rc = rfftw2d_create_plan(n, n, FFTW_REAL_TO_COMPLEX, FFTW_IN_PLACE);
  plan_cr = rfftw2d_create_plan(n, n, FFTW_COMPLEX_TO_REAL, FFTW_IN_PLACE);
}

#define FFT(s,u)\
  if(s==1) rfftwnd_one_real_to_complex(plan_rc,(fftw_real *)u,(fftw_complex*)u);\
  else rfftwnd_one_complex_to_real(plan_cr,(fftw_complex *)u,(fftw_real *)u)

and finally:

FFT(1,u0);
FFT(1,v0);

//math
...

//and transforming back
FFT(-1,u0); 
FFT(-1,v0);

After moving to CUFFT:

#define OURARRAYSIZE (DIM * 2*(DIM/2+1))
#define DIM 16

cufftHandle planR2C;
cufftHandle planC2R;
cufftReal forcesX[OURARRAYSIZE];
cufftReal forcesY[OURARRAYSIZE];
cufftReal  *dev_forcesX;
cufftReal  *dev_forcesY;

Init:

cufftPlan2d(&planR2C, DIM, DIM, CUFFT_R2C);
cufftPlan2d(&planC2R, DIM, DIM, CUFFT_C2R);
cufftSetCompatibilityMode(planR2C, CUFFT_COMPATIBILITY_FFTW_ALL);
cufftSetCompatibilityMode(planC2R, CUFFT_COMPATIBILITY_FFTW_ALL);
cudaMalloc( (void**)&dev_forcesX, OURARRAYSIZE*sizeof(cufftReal) );
cudaMalloc( (void**)&dev_forcesY, OURARRAYSIZE*sizeof(cufftReal) );

And finally:

cufftExecR2C(planR2C, (cufftReal*) dev_forcesX, (cufftComplex*)dev_forcesX);
cufftExecR2C(planR2C, (cufftReal*) dev_forcesY, (cufftComplex*)dev_forcesY);


cudaMemcpy( forcesX, dev_forcesX, OURARRAYSIZE*sizeof(cufftReal), cudaMemcpyDeviceToHost );
cudaMemcpy( forcesY, dev_forcesY, OURARRAYSIZE*sizeof(cufftReal), cudaMemcpyDeviceToHost );

diffuseVelocity(velocitiesX, velocitiesY, forcesX, forcesY);//MATH PART

cudaMemcpy( dev_forcesX, forcesX, OURARRAYSIZE*sizeof(cufftReal), cudaMemcpyHostToDevice );
cudaMemcpy( dev_forcesY, forcesY, OURARRAYSIZE*sizeof(cufftReal), cudaMemcpyHostToDevice );

cufftExecC2R(planC2R, (cufftComplex*) dev_forcesX, (cufftReal*)dev_forcesX);
cufftExecC2R(planC2R, (cufftComplex*) dev_forcesY, (cufftReal*)dev_forcesY);

cudaMemcpy( forcesX, dev_forcesX, OURARRAYSIZE*sizeof(cufftReal), cudaMemcpyDeviceToHost );
cudaMemcpy( forcesY, dev_forcesY, OURARRAYSIZE*sizeof(cufftReal), cudaMemcpyDeviceToHost );

After the math part both programs hold exactly the same data (matrix). Sadly after the reverse fourier transformation data in matrices differs. I noticed that corrupted data is that, which lies in bonus columns ( (DIM * 2*(DIM/2+1)) ) which are needed for in place transformation.

Does anybody have any idea, why? Is there something about CUFFT that i don't know?

For background, which CUDA version are you using, and what GPU / OS platform are you on? — njuffa, Sep 28 '12 at 21:14
It seems likely that this is a bug. You could try the CUDA 5.0 RC that is avalable to the public, but I would recommend filing a bug through the registered developer website. Please attach your repro case (it would be helpful to simplify it as much as possible). Go to http://developer.nvidia.com/cuda/cuda-toolkit and look for the green links in the text "Members of the CUDA Registered Developer Program can report issues and file bugs | Login or Join Today" at about the middle of the page. — njuffa, Sep 28 '12 at 22:08
Spoke to someone who knows a bit about CUFFT and he pointed out that you need to set the right pitch if there is "padding" at the end of the rows (your bonus columns would fit that description). Supposedly there is a way to handle this in CUFFT, but it may not be well documented and I don't know the details. Maybe someone knowledgable about CUFFT can provide details of the applicable API and what to pass. — njuffa, Sep 28 '12 at 22:21
There is sth like: cufftSetCompatibilityMode(planC2R, CUFFT_COMPATIBILITY_FFTW_PADDING); but it doesn't help. Without ALL or PADDING compatibility data in returned matrix is a trash. With it, data in matrix is correct except some numbers in padding columnes. As i know i need them anyway, cause they're used later in the code and with these from CUFFT application doesn't work properly. — aerion, Sep 29 '12 at 13:23
I think the best thing to do here is to file a bug. If there is a real functional issue wih CUFFT, it will get fixed. If the problem is lack documentation or example code showing how to deal with a use case like yours, that will get fixed, too. — njuffa, Sep 29 '12 at 16:44
It turned out that there was no bug at all. I made mistake somewhere else. Matrices differs cause they can. After reversed FFT padding columnes contain some random values and these values differs through libraries. Anyway thanks. You helped me understand padding and find my own bug. — aerion, Oct 03 '12 at 15:29

C2R FFT on CUDA (CUFFT) produces different results than FFTW

0 Answers0