Questions tagged [dynamic-parallelism]

dynamic parallelism refers to a capability in CUDA for device kernel launches to be performed from within a device kernel

This tag should be used for questions pertaining to CUDA dynamic parallelism. This refers to the capability for CUDA devices of compute capability 3.5 or higher to be able to launch a device kernel from within a device kernel. In addition, using this functionality requires the specification of certain CUDA compilation switches, such as the switch to enable relocatable device code, and the switch to link in the device runtime library.

50 questions

votes

1 answer

Synchronizing depth of nested kernels

Lets take the following code where there is a parent and child kernel. From said parent kernel we wish to start threadIdx.x child kernels in different streams to maximize parallel throughput. We then wait for those children with…

c++ cuda dynamic-parallelism

asked May 22 '18 at 19:10

user2255757

votes

1 answer

compile multiple cuda files (that have dynamic parallelism) and MPI code

I have a bunch of .cu files that use dynamic parallelism (a.cu, b.cu, c.cu.., e.cu, f.cu), and a main.c file that uses MPI to call functions from a.cu on multiple nodes. I'm trying to write a make file to compile the executable, but I keep facing…

c makefile cuda dynamic-parallelism

asked Apr 03 '18 at 19:48

user2330963

votes

1 answer

Synchronization in CUDA dynamic parallelism

I am testing dynamic parallelism with the following kernel, the one that gets the maximum value of an integer array using dynamic parallelism in a divide and conquer fashion: __global__ void getMax(int * arr, int ini, int fin, int * maxv) { if…

cuda dynamic-parallelism

asked Dec 23 '17 at 21:28

Matias Haeussler

1,061
2
12
25

votes

1 answer

How can I synchronize device-side command queues with host-side queues? clFinish() and markerWithWaitList gives invalid queue error

I'm using OpenCL 2.0 dynamic parallelism feature and have each workitem enqueue another kernel with single workitem. When work completion time of child kernel is high, parent kernel completes before children and memory consistency is not preserved…

synchronization opencl dynamic-parallelism

asked Jul 02 '17 at 13:02

huseyin tugrul buyukisik

11,469
4
45
97

votes

1 answer

Using shared memory in Dynamic Parallelism CUDA

Question 1: Do I have to specify the amount of dynamic shared memory to be allocated at the launch of parent kernel if shared memory is only used by child kernel. Question 2: The following is my child kernel and parent kernel Parent…

cuda dynamic-parallelism

asked Mar 30 '17 at 08:46

Aliya Clark

votes

1 answer

Dynamic parallelism - passing contents of shared memory to spawned blocks?

While I've been writing CUDA kernels for a while now, I've not used dynamic parallelism (DP) yet. I've come up against a task for which I think it might fit; however, the way I would like to be able to use DP is: If block figures out it needs more…

cuda dynamic-parallelism gpu-shared-memory

asked Feb 16 '17 at 10:01

einpoklum

118,144
57
340
684

votes

1 answer

"device-function-maxrregcount" message while compiling cuda code

I am trying to write a code which performs multiple vector dot product inside the kernel. I'm using cublasSdot function from cublas library to perform vector dot product. This is my code: using namespace std; __global__ void ker(float * a, float *…

cuda cublas dynamic-parallelism

asked Nov 30 '16 at 21:43

starrr

1,013
1
17
48

votes

1 answer

Trouble compiling/running CUDA code involving dynamic parallelism

I am trying to use dynamic parallelism with CUDA, but I cannot go through the compilation step. I am working on a GPU with Compute Capability 3.5 and the CUDA version 7.5. Depending on the switches in the compile command I use, I am getting…

cuda dynamic-parallelism

asked Jun 20 '16 at 15:25

VincentN

votes

1 answer

Accessing CUDA built-in variable in child kernel

I'm trying to use Kepler's Dynamic Parallelism for one of my application. The global index of the thread (in the parent kernel) launching the child kernel is needed in the child kernel. In other words, I want to access the parent's built-in…

cuda dynamic-parallelism

asked Sep 02 '15 at 14:46

user3813674

2,553
2
15
26

votes

1 answer

Cublas not working within kernel once compiled to cubin using -G flag with nvcc

I have a CUDA kernel that looks like the following: #include #include #include extern "C" { __device__ float ONE = 1.0f; __device__ float M_ONE = -1.0f; __device__ float ZERO = 0.0f; …

cuda cublas dynamic-parallelism

asked Aug 19 '15 at 09:50

Bam4d

votes

1 answer

Do kernel-launched child kernels have the same warp size as host-launched kernels?

When a kernel block is launched from the host, it has a warp size of 32. Is it the same for child kernels launched via dynamic parallelism? My guess would be yes, but I haven't seen it in the docs. The larger question, of course, is: is it worth it?…

cuda dynamic-parallelism

asked Jul 06 '15 at 08:29

mmdanziger

4,466
2
31
47

votes

1 answer

cuda dynamic parallelism linkage error extern c

I'm trying to link my CUDA Kepler's Dynamic Parallelism program as follows: nvcc -m32 -arch=sm_35 -dc -Xcompiler '-fPIC' DFS_Solving.cu nvcc -m32 -arch=sm_35 -Xcompiler '-fPIC' -dlink DFS_Solving.o -o link.o gcc -shared -Wl,-soname,libdfs.so -o…

gcc cuda nvcc linkage dynamic-parallelism

asked Mar 11 '15 at 15:31

andersonbp

votes

1 answer

Dynamic Parallelism in CUDA not working

I wrote a simple code to understand Dynamic Parallelism. From the values being printed,I see that the child kernel has executed correctly, but when I come back to the parent kernel, I see wrong values being used in place of temp array which is being…

cuda dynamic-parallelism

asked Sep 14 '14 at 10:30

Jagannath

votes

1 answer

CUDA dynamic parallelism: invalid global write when using texture memory

I seem to have troubles when a kernel call within a kernel (even recursive call) uses texture memory to get a value. If the child kernel, say a different one, doesn't use texture memory, everything is fine. If I don't call a kernel within a kernel,…

c++ linux cuda dynamic-parallelism

asked Aug 20 '14 at 15:26

salvaS

votes

1 answer

Nvidia Jetson TK1 Development Board - Cuda Compute Capability

I have quite impressed with this deployment kit. Instead of buying a new CUDA card, which might require new main board and etc, this card seems provide all in one. At it's specs it says it has CUDA compute capability 3.2. AFAIK dynamic parallelism…

cuda embedded specifications kepler dynamic-parallelism

asked Jul 20 '14 at 07:16

phoad

1,801
2
20
31

Prev 1 2

4 Next