I'm running the following simple code on a strong server with a bunch of Nvidia RTX A5000/6000 with Cuda 11.8. For some reason, FFT with the GPU is much slower than with the CPU (200-800 times). Does anyone have an idea of why that might be? I tried different GPUs but the results remain approximately the same.
import sigpy as sp
import torch
import time
arr = sp.shepp_logan((256, 256))
device = "cpu"
arr = torch.from_numpy(arr).to(device)
tic = time.perf_counter()
res = torch.fft.fft2(arr, dim=(-2, -1))
toc = time.perf_counter()
cpu_time = toc - tic
device = "cuda:5"
arr = arr.to(device)
tic = time.perf_counter()
res = torch.fft.fft2(arr, dim=(-2, -1))
toc = time.perf_counter()
gpu_time = toc - tic
print(f"CPU time: {cpu_time}, GPU time: {gpu_time} ratio: {gpu_time / cpu_time}")
Thanks!