8

I have the following (snippet) of a kernel.

__global__ void plain(int* geneVec, float* probs, int* nComponents, float* randomNumbers,int *nGenes)
{

    int xid = threadIdx.x + (blockDim.x * blockIdx.x);

    float* currentProbs= (float*)malloc(sizeof(float)*tmp);

         .....
         .....

    currentProbs[0] = probs[start];
    for (k=1;k<nComponents[0]; k++)
    {
        currentProbs[k] = currentProbs[k-1] + prob;
    }

       ...
       ...
      free(currentProbs);

}

When it's static (even the same sizes) it's very fast, but when CurrentProbs is dynamically allocated (as above) performance is awful.

This question said I could do this inside a kernel: CUDA allocate memory in __device__ function

Here is a related question: Efficiency of Malloc function in CUDA

I was wondering if any other methods have solved this other than the one proposed in the paper? It seems ridiculous that one cannot malloc/free inside a kernel without this sort of penalty.

Community
  • 1
  • 1
RNs_Ghost
  • 1,687
  • 5
  • 25
  • 39

1 Answers1

12

I think the reason introducing malloc() slows your code down is that it allocates memory in global memory. When you use a fixed size array, the compiler is likely to put it in the register file, which is much faster.

Having to do a malloc inside your kernel may mean that you're trying to do too much work with a single kernel. If each thread allocates a different amount of memory, then each thread runs a different number of times in the for loop, and you get lots of warp divergence.

If each thread in a warp runs loops the same number of times, just allocate up front. Even if they run a different number of times, you can use a constant size. But instead, I think you should look at how you can refactor your code to entirely remove that loop from your kernel.

Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
  • 4
    The compiler will never assign kernel variables to shared memory unless the programmer defines them using the `__shared__` qualifier. Only registers or local memory. – talonmies Mar 21 '12 at 15:33