Now CUDA allows dynamic allocation on the global memory. However, I couldn't find any reference to the scalability of that malloc
function: is it any better than, for instance, preallocate a chunk of memory and then just assign the next memory chuck to a thread by atomically incrementing a global integer? This last "home-made" solution works but there is an obvious problem with scalability, so I wonder whether malloc
takes care of that somehow.
Asked
Active
Viewed 317 times
1

GalicianMario
- 13
- 2
-
Wait -- CUDA allows malloc from GPU code now? – wump Jan 19 '11 at 15:10
-
yep, CUDA programming guide 3.2, page 122 – GalicianMario Jan 24 '11 at 01:21
1 Answers
0
I think while your "home-made" solution might be just as good currently, although concurrent calls to a global integer might slow it down, Malloc would be my choice.
This is because it allows for Nvidia to deal with the headache of scalability and to make improvements, in either the hardware or software implementation, that you can taken advantage of just by re-compiling your code at a later date.

Phil
- 1,110
- 1
- 9
- 25
-
You are right, I'm just cautious about using malloc since it is a well known bottleneck for scalability in multicore programming (so people normally use hoard) - I just wish that there would be a paper/study about the scalability of the CUDA malloc, so I do not need to resort to hand-made solutions. – GalicianMario Jan 24 '11 at 01:25