I've been struggling the whole day, trying to make a basic CUFFT example work properly. However i run into a little problem which I cannot identify. Basically I have a linear 2D array vx with x and y coordinates. Then I just calculate a forward then backward CUFFT (in-place), that simple. Then I copy back the array vx, normalize it by NX*NY , then display.
#define NX 32
#define NY 32
#define LX (2*M_PI)
#define LY (2*M_PI)
float *x = new float[NX*NY];
float *y = new float[NX*NY];
float *vx = new float[NX*NY];
for(int j = 0; j < NY; j++){
for(int i = 0; i < NX; i++){
x[j*NX + i] = i * LX/NX;
y[j*NX + i] = j * LY/NY;
vx[j*NX + i] = cos(x[j*NX + i]);
}
}
float *d_vx;
CUDA_CHECK(cudaMalloc(&d_vx, NX*NY*sizeof(float)));
CUDA_CHECK(cudaMemcpy(d_vx, vx, NX*NY*sizeof(float), cudaMemcpyHostToDevice));
cufftHandle planr2c;
cufftHandle planc2r;
CUFFT_CHECK(cufftPlan2d(&planr2c, NY, NX, CUFFT_R2C));
CUFFT_CHECK(cufftPlan2d(&planc2r, NY, NX, CUFFT_C2R));
CUFFT_CHECK(cufftSetCompatibilityMode(planr2c, CUFFT_COMPATIBILITY_NATIVE));
CUFFT_CHECK(cufftSetCompatibilityMode(planc2r, CUFFT_COMPATIBILITY_NATIVE));
CUFFT_CHECK(cufftExecR2C(planr2c, (cufftReal *)d_vx, (cufftComplex *)d_vx));
CUFFT_CHECK(cufftExecC2R(planc2r, (cufftComplex *)d_vx, (cufftReal *)d_vx));
CUDA_CHECK(cudaMemcpy(vx, d_vx, NX*NY*sizeof(cufftReal), cudaMemcpyDeviceToHost));
for (int j = 0; j < NY; j++){
for (int i = 0; i < NX; i++){
printf("%.3f ", vx[j*NX + i]/(NX*NY));
}
printf("\n");
}
When vx is defined as cos(x) or sin(x), it works fine, but when using sin(y) or cos(y), it gives me back the correct function (sin or cos) but with half amplitude (that is, oscillating between 0.5 and -0.5 instead of 1 and -1) ! Note that using sin(2*y) or cos(2*y) (or sin(4*y), cos(4*y), ...) works fine. Any idea?