1

NVIDIA cuda documentation for cuFFT says "These batched transforms have higher performance than single transforms" (Read more at: http://docs.nvidia.com/cuda/cufft/index.html#ixzz57haP0Mtz Follow us: @GPUComputing on Twitter | NVIDIA on Facebook) But does not show anything quantitative. any information about how much the speed up would be ? from a single transform I mean inside a for loop.

JimBamFeng
  • 709
  • 1
  • 4
  • 20

1 Answers1

1

Speedup will depend on the size of the matrices, the number of batches, and the targeted hardware (also the CUDA Toolkit version). If you have a large batch of small matrices you would see more of a speedup than otherwise. Part of the speedup is avoiding the launch overhead, so for matrix sizes that are large enough that the launch overhead is small compared to kernel execution, you won’t see as much speedup. I believe for very small matrices they can pack several batches together and use the more (memory) efficient device functions.

I'm asking around to see if there are any white papers or other published reports. So far I haven't found any.

Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11