2

I am looking at the Nvidia SDK for the convolution FFT example (for large kernels), I know the theory behind fourier transforms and their FFT implementations (the basics at least), but I can't figure out what the following code does:

const int    fftH = snapTransformSize(dataH + kernelH - 1);
const int    fftW = snapTransformSize(dataW + kernelW - 1);

....//gpu initialization code

printf("...creating R2C & C2R FFT plans for %i x %i\n", fftH, fftW);
        cuf ftSafeCall( cufftPlan2d(&fftPlanFwd, fftH, fftW, CUFFT_R2C) );
        cufftSafeCall( cufftPlan2d(&fftPlanInv, fftH, fftW, CUFFT_C2R) );

    printf("...uploading to GPU and padding convolution kernel and input data\n");
        cutilSafeCall( cudaMemcpy(d_Kernel, h_Kernel, kernelH * kernelW * sizeof(float), cudaMemcpyHostToDevice) );
        cutilSafeCall( cudaMemcpy(d_Data,   h_Data,   dataH   * dataW *   sizeof(float), cudaMemcpyHostToDevice) );
        cutilSafeCall( cudaMemset(d_PaddedKernel, 0, fftH * fftW * sizeof(float)) );
        cutilSafeCall( cudaMemset(d_PaddedData,   0, fftH * fftW * sizeof(float)) );

        padKernel(
            d_PaddedKernel,
            d_Kernel,
            fftH,
            fftW,
            kernelH,
            kernelW,
            kernelY,
            kernelX
        );

        padDataClampToBorder(
            d_PaddedData,
            d_Data,
            fftH,
            fftW,
            dataH,
            dataW,
            kernelH,
            kernelW,
            kernelY,
            kernelX
        );

I've never used CUFFT library before so I don't know what the snapTransformSize does

(here's the code)

int snapTransformSize(int dataSize){
    int hiBit;
    unsigned int lowPOT, hiPOT;

    dataSize = iAlignUp(dataSize, 16);

    for(hiBit = 31; hiBit >= 0; hiBit--)
        if(dataSize & (1U << hiBit)) break;

    lowPOT = 1U << hiBit;
    if(lowPOT == dataSize)
        return dataSize;

    hiPOT = 1U << (hiBit + 1);
    if(hiPOT <= 1024)
        return hiPOT;
    else 
        return iAlignUp(dataSize, 512);
}

nor why the complex plane is such initialized.

Can you provide me explanation links or answers please?

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
Marco A.
  • 43,032
  • 26
  • 132
  • 246

3 Answers3

2

It appears to be rounding up the FFT dimensions to the next power of 2, unless the dimension would exceed 1024, in which case it's rounded up to the next multiple of 512.

Having rounded up the FFT size you then of course need to pad your data with zeroes to make it the correct size for the FFT.

Note that the reason that we typically need to round up and pad for convolution is because each FFT dimension needs to be image_dimension + kernel_dimension - 1, which is not normally a convenient number, such as a power of 2.

Paul R
  • 208,748
  • 37
  • 389
  • 560
1

What @Paul R says is correct. Why it does that is because The Fast Fourier Transform operation requires multiple of two to be executed at the fastest speed. See the Cooley-Tukey algorithm

just make sure that you are declaring a matrix that is a power of two and you should not need that generic safe implementation.

fabrizioM
  • 46,639
  • 15
  • 102
  • 119
  • Power of 2 is *not* necessary for all FFT implementations, and it seems that CUFFT can cope with non power of 2 for larger FFT sizes anyway, where it uses multiples of 512 instead. For convolution you can't usually make the FFT size a power of 2, because the dimensions needs to be image_dimension + kernel_dimension - 1, hence the need for rounding up and padding. – Paul R Apr 01 '11 at 08:59
  • 1
    @farbrizioM: neither multiple of 2 nor power of 2 are necessary. An FFT can be implemented for any size which can be factored into small primes, e.g. FFTW works with factors of 2, 3, 5, 7. – Paul R Apr 01 '11 at 09:51
  • 1
    Yes, an FFT can be implemented for non-power-of-two, but you're missing the point: Powers of two can be computed a **lot** **faster**, so the code is trying to give you better performance by converting a 500-point FFT into a 512-point FFT, for example. – Die in Sente Apr 01 '11 at 23:28
  • Thank all of you for your answers, I'll study how this is implemented better but now I have a starting point to begin studying. Reading at your answers I suppose the CUFFT code might work for every kernel/matrix sizes.. if this assumption was true, I'd be DEFINITELY happy! – Marco A. Apr 02 '11 at 08:55
-1

It is rounding up the FFT dimensions to the power of 2, and until the dimension would exceed 1024, it rounded up to the multiple of 512. You should pad the data with zeroes to make it the correct size for the FFT. `