0

I am using pytorch for calculating the 3d convolution using the FFT. The current code does an FFT first, then a pointwise multiplication in Fourier space and finally an inverse FFT. The code works but seems to be very slow compared to an optimized C++ CUDA code (approximately a factor 5 slower).

I think the main problem is that each pytorch operation is handled by a seperate CUDA kernel. Is it somehow possible to merge these operations? I heard about cuFFTDx. Can this somehow be used from python? Or does cupy help?

thanks for any hint. best wishes Florian

0 Answers0