Questions tagged [dynamic-parallelism]

dynamic parallelism refers to a capability in CUDA for device kernel launches to be performed from within a device kernel

This tag should be used for questions pertaining to CUDA dynamic parallelism. This refers to the capability for CUDA devices of compute capability 3.5 or higher to be able to launch a device kernel from within a device kernel. In addition, using this functionality requires the specification of certain CUDA compilation switches, such as the switch to enable relocatable device code, and the switch to link in the device runtime library.

50 questions
0
votes
2 answers

CMake to generate a MSVC CUDA project that targets newer devices

My PC has a GTX 580 (compute capability 2.0). I want to compile a CUDA source that uses dynamic parallelism, a feature introduced in compute capability 3.5. I know I will not be able to run the program on my GPU, however, it should be possible to…
0
votes
1 answer

Parallelize a method from inside a CUDA device function / kernel

I've got an already parallelized CUDA kernel that does some tasks which require frequent interpolation. So there's a kernel __global__ void complexStuff(...) which calls one or more times this interpolation device function: __device__ void…
-1
votes
1 answer

Dynamic Parallelism - separate compilation: undefined reference to __cudaRegisterLinkedBinary

Although I have followed apendix C "Compiling Dynamic Parallelism" from "CUDA Programming Guide" and the solutions given here, I cannot manage to solve the problem I have. After the compilation and linking (make DivideParalelo) I get the following…
emartel
  • 49
  • 9
-2
votes
1 answer

CUDA Dynamic Parallelism Deferencing Global Memory

To test out dynamic parallelism, I wrote a simple code and compiled it on GTX1080 with the following commands. nvcc -arch=sm_35 -dc dynamic_test.cu -o dynamic_test.o nvcc -arch=sm_35 dynamic_test.o -lcudadevrt -o dynamic_test However, the…
JYC
  • 1
  • 3
-3
votes
1 answer

Optimise Algorithm Using Dynamic Parallelism

I have the following code fragment and am experimenting with features of the new Kepler Architecture. The kernel is called several times in a loop with fixed NUM_ITERATIONS. Do you think shifting the loop into a parent kernel would help i.e., is the…
1 2 3
4