I am doing 2D FFT on 128 images of size 128 x 128 using CUFFT library. The way I used the library is the following:
unsigned int nx = 128; unsigned int ny = 128; unsigned int nz = 128;
// Make 2D fft batch plan
int n[2] = {nx, ny};
int inembed[] = {nx, ny};
int onembed[] = {nx, ny};
cufftPlanMany(&plan,
2, // rank
n, // dimension
inembed,
1, // istride
nx * ny, // idist
onembed,
1, //ostride
nx * ny, // odist
CUFFT_D2Z,
nz);
cufftSetCompatibilityMode(plan,CUFFT_COMPATIBILITY_NATIVE)
// Create output array
complex<double>* out_complex = new complex<double>[nx * ny * nz];
// Initialize output array
for (unsigned int i = 0; i < nx * ny * nz; i++) {
out_complex[i].real(0);
out_complex[i].imag(0);
}
cudaMalloc( (void**)&idata, sizeof(cufftDoubleReal) * nx * ny * nz );
cudaMalloc( (void**)&odata, sizeof(cufftDoubleComplex) * nx * ny * nz );
cudaMemcpy( idata, in_real, nx * ny * nz * sizeof(cufftDoubleReal),
cudaMemcpyHostToDevice ) );
cudaMemcpy( odata, out_complex, nx * ny * nz * sizeof(cufftDoubleComplex),
cudaMemcpyHostToDevice ) );
cufftExecD2Z( plan, idata, odata );
cudaMemcpy( out_complex, odata, nx * ny * nz * sizeof(cufftDoubleComplex),
cudaMemcpyDeviceToHost ) );
The input in_real on the host is a big array holding the 3D images, which is a double array. I guess there should be no problem converting to/from double from/to cufftDoubleReal and complex from/to cufftDoubleComplex? I am a little suspicious about the way the plan was made and the parameters, which I tried to find some example online but they are not that helpful nor consistent. Then I just set the parameters via the programming guide using my own understanding.
As indicate by the title, the output is partially correct (the left half plane), with the right half plane zeros, which makes me so confused. I tried to set different types of compatibility mode but it was not that helpful. The version that I am comparing to is the MATLAB fft2().