I'm in the process of accelerating some data analysis code with GPU and am currently doing some profiling and comparisons between the numpy.fft library and cuFFT (using the skcuda.fft wrapper).
I'm certain I'm just missing something obvious about the FFT implementation in cuFFT, but I'm struggling to find what it is in the cuFFT documentation.
To setup the problem I create 500 ms of data sampled at 100 MS/s with a few spectral components. Then, I declare the GPU arrays, the cufft plan (R2C) and run the fft with a subset of the data. Finally, I have a comparison with numpy.fft.rfft:
time = np.linspace(0, 500E-3, int(50E6))
freq = 100E6
data = np.sin(2*np.pi*time*1E6)
data += np.sin(2*np.pi*time*2E6 + 0.5)
data += np.sin(2*np.pi*time*3E6 - 0.1)
data += np.sin(2*np.pi*time*4E6 - 0.9)
data += np.sin(2*np.pi*time*1E6 - 1.9)
data += np.sin(2*np.pi*time*15E6 - 2.1)
data += np.sin(2*np.pi*time*20E6 - 0.3)
data += np.sin(2*np.pi*time*25E6 - 0.3)
nPtsFFT = int(2**13)
dDev = gp.GPUArray(nPtsFFT, np.float64)
dDev.set(data[:nPtsFFT])
rDev = gp.GPUArray(int(nPtsFFT/2+1), np.float64)
plan = cufft.Plan(nPtsFFT, np.float64, np.complex128)
cufft.fft(dDev, rDev, plan)
rHost = rDev.get()
freqs = np.fft.rfftfreq(nPtsFFT, 1/freq)
hfftRes = np.fft.rfft(data[:nPtsFFT])
plt.loglog(freqs, np.abs(hfftRes), label='npfft')
plt.loglog(freqs, np.abs(rHost), label='cufft')
plt.legend()
plt.show()
I naively assumed that these would be approximately equal, but what I find is that the cufft peaks are all shifted and every other point is lower than expected.
This reminded me of the output of scipy.fftpack.rfft, so checking the documentation there I found the interleaving of Re and Im parts. So, if I modify the plotting to:
plt.loglog(freqs, np.abs(hfftRes), label='npfft')
plt.loglog(freqs[:-1:2]/2, np.abs(rHost[:-1:2] + 1j*rHost[1::2]), label='cufft')
plt.legend()
plt.show()
I now get what I expect but only up to 25 MHz, when I should be able to get data up to 50 MHz given the sampling rate.
Is there a way to extract the data up to the Nyquist frequency from this transform?