-2

I'm currently trying to run my multiple FFT's in a loop to overcome the 128 million element max of the cuFFT plan. So, for example, I would run 128 million element runs in a loop.

My program works just fine for a single FFT call, but looping doesn't seem to work. I think maybe its because of how I offset the FFT. Here is a snippet of how I did it:

cufftComplex *d_signal;
checkCudaErrors(cudaMalloc((void **)&d_signal, mem_size));
cufftComplex *d_filter_kernel;
checkCudaErrors(cudaMalloc((void **)&d_filter_kernel, mem_size));

int rankSize = 2;       
int rank[2];
    rank[0] = TempSearchSizeY; rank[1] = TempSearchSizeX;       
int FFTPlanSize = 500;
cufftHandle planinitial;
cufftResult r;
r = cufftPlanMany(&planinitial, rankSize, rank, NULL, 1, 0, NULL, 1, 0, CUFFT_C2C, FFTPlanSize);
int NrOfFFTRuns = ceil(loadsize / FFTPlanSize);
int FFTOffset = 0;

    checkCudaErrors(cudaMemcpy(d_signal, imageNew, sizeof(Complex)*TempSearchArea*loadsize, cudaMemcpyHostToDevice));
    checkCudaErrors(cudaMemcpy(d_filter_kernel, tempNew, sizeof(Complex)*TempSearchArea*loadsize, cudaMemcpyHostToDevice));


    for (int a = 0; a < NrOfFFTRuns; a++){
                FFTOffset = FFTPlanSize*a;
                r = cufftExecC2C(planinitial, (cufftComplex *)&d_signal[FFTOffset], (cufftComplex *)&d_signal[FFTOffset], CUFFT_FORWARD);
                PrintFFTPlanStatus(r);
                r = cufftExecC2C(planinitial, (cufftComplex *)&d_filter_kernel[FFTOffset], (cufftComplex *)&d_filter_kernel[FFTOffset], CUFFT_FORWARD);
                PrintFFTPlanStatus(r);
                cout << "Run inital" << endl;
    {

The above code returns the wrong result. Can someone help me figure out the problem?

EE_Guy
  • 113
  • 1
  • 1
  • 9
  • 1
    Where do you initialize rank? Please include a [mcve]. – havogt Jun 09 '16 at 08:53
  • Sorry this is taken out of a much bigger code. I have edited the rank initialization into the code. I will see if i can isolate the code in a single file and edit in the missing parts, if there are any. Though i was hoping it was just a syntax error somewhere that made it fail. – EE_Guy Jun 09 '16 at 09:07

1 Answers1

1

I figured it out myself.

I forgot to multiply the element size(TempSearchSizeY*TempSearchSizeX) of each batch to the offset value. It should be

offset = a * element size * batch size. 

This case only contained

offset = a* batch size. 
talonmies
  • 70,661
  • 34
  • 192
  • 269
EE_Guy
  • 113
  • 1
  • 1
  • 9