When using Dynamic Parallelism in CUDA, you can implement recursive algorithms like mergeSort. I have implemented it and my program don't work for inputs greater than blah.
My question is how many depth in the recursion tree the implementation can go? Is there any limitation? (My program is just fine for smaller inputs.)
Asked
Active
Viewed 1,648 times
1

AmirSojoodi
- 1,080
- 2
- 12
- 31
-
1http://stackoverflow.com/questions/14301903/cuda-5-x-on-kepler-dynamic-kernel-execution-and-maximum-recursion-depth – void_ptr Jan 03 '15 at 17:06
1 Answers
4
From Professional CUDA C Programming:
The maximum nesting depth of dynamic parallelism is limited to 24, but in reality most kernels will be limited by the amount of memory required by the device runtime system at each new level . . .

user14717
- 4,757
- 2
- 44
- 68
-
This is documented in [the programming guide](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#nesting-and-synchronization-depth) as well. – Robert Crovella Jan 05 '15 at 16:59
-
Seems like something that will need to go in the `cudaDeviceProp` struct eventually. – user14717 Jan 05 '15 at 18:19