Dynamic allocation in CUDA - lower contention than handwritten solution?

Question

Now CUDA allows dynamic allocation on the global memory. However, I couldn't find any reference to the scalability of that malloc function: is it any better than, for instance, preallocate a chunk of memory and then just assign the next memory chuck to a thread by atomically incrementing a global integer? This last "home-made" solution works but there is an obvious problem with scalability, so I wonder whether malloc takes care of that somehow.

Wait -- CUDA allows malloc from GPU code now? – wump Jan 19 '11 at 15:10 — wump, Jan 19 '11 at 15:10
yep, CUDA programming guide 3.2, page 122 – GalicianMario Jan 24 '11 at 01:21 — GalicianMario, Jan 24 '11 at 01:21

score 0 · Accepted Answer · answered Jan 19 '11 at 13:53

0

I think while your "home-made" solution might be just as good currently, although concurrent calls to a global integer might slow it down, Malloc would be my choice.

This is because it allows for Nvidia to deal with the headache of scalability and to make improvements, in either the hardware or software implementation, that you can taken advantage of just by re-compiling your code at a later date.

answered Jan 19 '11 at 13:53

Phil

1,110
1
9
25

You are right, I'm just cautious about using malloc since it is a well known bottleneck for scalability in multicore programming (so people normally use hoard) - I just wish that there would be a paper/study about the scalability of the CUDA malloc, so I do not need to resort to hand-made solutions. – GalicianMario Jan 24 '11 at 01:25

Dynamic allocation in CUDA - lower contention than handwritten solution?

1 Answers1