I studied the Cooley Tukey algorithm and I understood it. I got everything in the CUDA convolutionFFT2D example till these kernels:
spProcess2D calls -> spProcess2D_kernel which calls a lot of -> spPostprocessC2C, mulAndScale and spPreprocessC2C
Here's the complete code: http://nopaste.info/30c13e44fe.html (convolutionFFT2D.cu, here is the spProcess2D function) http://nopaste.info/78d22afac2.html (convolutionFFT2D.cuh, here are the other functions)
I already read all the nvidia sdk papers but I can't still figure out what these function do (they use twiddles, but nothing seems like a Cooley Tukey algorithm there)
Please help me if you can, or at least point me out where to solve my problem
Update: I found this link: http://cnx.org/content/m16336/latest/#uid38 Maybe these functions are performing a breadth-first algorithm? I still can't say that but the shape seems the same