1

When calling cufftPlanMany() the first time, it takes about 0.7 sec, but all next calls are fast. Any idea how to accelerate the first call of cufftPlanMany()?

Maghraby
  • 11
  • 3
  • 2
    The cufft library has an initialization time associated with it. That is what you are experiencing. [this answer](http://stackoverflow.com/questions/31012941/cufft-is-1000x-slower-in-vs2013-cuda7-0-compared-to-vs2010-cuda4-2) may be of interest. I don't think you'll be able to avoid it. – Robert Crovella Sep 18 '15 at 21:44
  • You are right. I'm asking about any way to avoid such initialization penalty. I tried to make a dummy call at the beginning to cufftPlanMany() with small parameters. It didn't help!!!. – Maghraby Sep 18 '15 at 22:03
  • Or is there any library that doesn't suffer from such initialization penalty, and give a good processing performance? – Maghraby Sep 18 '15 at 22:04

1 Answers1

1

First call to cufftPlanMany causes libcufft.so to be loaded. This in turns initalizes cuda context if needed and loads all the kernels. It would always take some time depending on the size of the library. 0.7 of a second is a bit excessive and it will be reduced in next version of cuFFT. We also reduced time of each subsequent cufftPlan* function a bit.

Why do you need low initialization time?

llukas
  • 359
  • 1
  • 4
  • Thanks a lot for your answer. Actually, I'd like to achieve a large speedup compared to the non-GPU implementation. This initialization overhead reduce the overall speedup I got significantly. – Maghraby Nov 24 '15 at 22:05
  • Thank you llukas. If you know, could you please provide the version you are expecting for the performance improvement on the FFT initialization? – Ethan Brown Apr 13 '17 at 22:21
  • I'd need to check exactly which versions had bumps. r8.0 initializes for me in 0.25 seconds, is that what you see as well? – llukas Apr 14 '17 at 22:55