2

I have a __global__ function in CUDA. Can it call itself?

Here is my example:

__global__ void 
force_create_empty_nodes (struct NODE *Nodes, int topnode, int bits, int no, int x, int y, 
                          int z, struct topnode_data *TopNodes)
{
    /// * Some code *///
    force_create_empty_nodes <<<1, 8>>>(Nodes, topnode+1, bits+1, no+1, 
                                             x+1, y+1, z+1, TopNodes);
}

And error I receive is:

error: kernel launch from __device__ or __global__ functions requires separate compilation mode

Here is my make command:

nvcc -c -arch compute_35 cudaForceNodes.cu -o obj/cudaForceNodes.o
talonmies
  • 70,661
  • 34
  • 192
  • 269
shoc
  • 133
  • 3
  • 7
  • 1
    Take a look at the answers to this question: http://stackoverflow.com/questions/3644809/does-cuda-support-recursion – Michael Sep 19 '13 at 20:31
  • I looked here but my CUDA version is 5.5 and my GPU is GeForce GT 550M. I tried __device__ function and work but when try __global__ function i cant build project. I need to create new thread in every recursion. Is it posible in other way. – shoc Sep 19 '13 at 21:02
  • If you have a GT 550M, dynamic parallelism (ie. launching a kernel from a kernel) isn't supported and code compiled for the sm_25 architecture won't run on your GPU. – talonmies Sep 19 '13 at 21:05

1 Answers1

5

Calling a kernel from another kernel is called dynamic parallelism. The documentation for it is here.

It requires:

  1. A compute capability 3.5 device. You can find the compute capability of your device by running the cuda deviceQuery sample.
  2. Various switches in the compile command, including those specifying compilation for a cc3.5 architecture and those needed for separate (device) compilation, and linking with the device runtime.

Since your GT550M is not a cc 3.5 device, you won't be able to use this feature. There is no other way to call a kernel from within a kernel.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257