Questions tagged [dynamic-parallelism]

dynamic parallelism refers to a capability in CUDA for device kernel launches to be performed from within a device kernel

This tag should be used for questions pertaining to CUDA dynamic parallelism. This refers to the capability for CUDA devices of compute capability 3.5 or higher to be able to launch a device kernel from within a device kernel. In addition, using this functionality requires the specification of certain CUDA compilation switches, such as the switch to enable relocatable device code, and the switch to link in the device runtime library.

50 questions

vote

3 answers

Generating Relocatable Device Code using Nvidia Nsight

I'm trying to compile a dynamic parallelism example on CUDA and when i try to compile it gives and error saying, kernel launch from __device__ or __global__ functions requires separate compilation modes Later found that I have to set the…

cuda dynamic-parallelism

asked Jul 08 '16 at 06:52

BAdhi

vote

2 answers

Nested Directives in OpenACC

I'm trying to use nested feature of OpenACC to active dynamic parallelism of my gpu card. I've Tesla 40c and my OpenACC compiler is PGI version 15.7. My code is so simple. When I try to compile following code compiler returns me these messages…

cuda gpu nvidia openacc dynamic-parallelism

asked Aug 12 '15 at 10:21

grypp

vote

2 answers

Understanding Dynamic Parallelism in CUDA

Example of dynamic parallelism: __global__ void nestedHelloWorld(int const iSize,int iDepth) { int tid = threadIdx.x; printf("Recursion=%d: Hello World from thread %d" "block %d\n",iDepth,tid,blockIdx.x); // condition to stop recursive…

cuda dynamic-parallelism

asked Jun 11 '15 at 11:24

John

3,037
8
36
68

vote

1 answer

Is it possible to call cublas functions from a device function?

In here Robert Crovella said that cublas routines can be called from device code. Although I am using dynamic parallelism and compiling with compute capability 3.5, I cannot manage to call Cublas routines from a device function. I always get the…

cuda device cublas dynamic-parallelism

asked Mar 19 '15 at 11:00

emartel

vote

1 answer

CUDA recursion depth

When using Dynamic Parallelism in CUDA, you can implement recursive algorithms like mergeSort. I have implemented it and my program don't work for inputs greater than blah. My question is how many depth in the recursion tree the implementation can…

recursion cuda dynamic-parallelism

asked Jan 03 '15 at 17:01

AmirSojoodi

1,080
2
12
31

vote

1 answer

numba.typeinfer.TypingError: Untyped global name 'child_launch' when using CUDA Dynamic Parallelism in Python ( Anaconda ) on NVIDIA GPU

My code is here: import numpy as np from numbapro import cuda @cuda.autojit def child_launch(data): data[cuda.threadIdx.x] = data[cuda.threadIdx.x] + 100 @cuda.autojit def parent_launch(data): data[cuda.threadIdx.x] = cuda.threadIdx.x …

python cuda dynamic-parallelism

asked Oct 13 '14 at 07:40

Ethan Huang

vote

1 answer

How to compile a .cu with dynamic parallelism?

I have 2 cpp files setup and functions, 6 .cu files main, flood, timestep, discharge, continuity and copy. I'm trying to compile this to the main call the cpp files and so the flood kernel global and then flood call timestep, discharge, continuity…

c++ cuda dynamic-parallelism

asked Apr 16 '14 at 00:24

Seffrin

votes

1 answer

CUDA dynamic parallelism is computing sequentially

I need to write an application that computes some matrices from other matrices. In general, it sums outer products of rows of initial matrix E and multiplies it by some numbers calculated from v and t for each t in a given range. I am a newbie to…

cuda dynamic-parallelism

asked Jan 13 '23 at 14:13

Daniil Tarpanov

votes

1 answer

How do I wait for child kernels to finish in a parent kernel before executing the rest of the parent kernel in CUDA dynamic parallelism?

So I need the runParatron children to fully finish before the next iteration of the for loop happens. Based on the results I am getting, I'm pretty sure that's not happening. For example, I have a print statement in runParatron that executes AFTER…

parallel-processing cuda gpu dynamic-parallelism

asked Jan 07 '23 at 20:03

yugi957

votes

1 answer

Can I copy files from Sharepoint to Azure Blob Storage using dynamic file path?

I am building a pipeline to copy files from Sharepoint to Azule Blob Storage at work. After reading some documentation, I was able to create a pipeline that only copies certain files. However, I would like to automate this pipeline by using dynamic…

azure sharepoint azure-synapse dynamic-parallelism

asked Jul 14 '22 at 00:25

CuteeeeRabbit

votes

1 answer

CUDA dynamic parallelism: Access child kernel results in global memory

I am currently trying my first dynamic parallelism code in CUDA. It is pretty simple. In the parent kernel I am doing something like this: int aPayloads[32]; // Compute aPayloads start values here int* aGlobalPayloads =…

memory-management cuda dynamic-parallelism

asked Feb 16 '22 at 00:51

Silicomancer

8,604
10
63
130

votes

1 answer

Can a CUDA parent kernel launch a child kernel with more threads than the parent?

I'm trying to learn how to use CUDA Dynamic Parallelism. I have a simple CUDA kernel that creates some work, then launches new kernels to perform that work. Let's say I launch the parent kernel with only 1 block of 1 thread, like so: int nItems =…

cuda dynamic-parallelism

asked Oct 09 '21 at 14:31

Justin

1,881
4
20
40

votes

1 answer

Why is cudaLaunchCooperativeKernel() returning not permitted?

So I am using GTX 1050 with a compute capability of 6.1 with CUDA 11.0. I need to use grid synchronization in my program so cudaLaunchCooperativeKernel() is needed. I have checked my device query so the GPU does have support for cooperative groups.…

cuda dynamic-parallelism gpu-cooperative-groups

asked Dec 22 '20 at 09:23

abhishekpurandare1297

votes

1 answer

How to call a Thrust function in a stream from a kernel?

I want to make thrust::scatter asynchronous by calling it in a device kernel(I could also do it by calling it in another host thread). thrust::cuda::par.on(stream) is host function that cannot be called from a device kernel. The following code was…

cuda thrust dynamic-parallelism

asked Sep 24 '19 at 11:07

heapoverflow

votes

1 answer

Nvidia visual profiler not showing cudaMalloc() after kernel launch

I am trying to write a program that runs almost entirely on the GPU (with very little interaction with the host). initKernel is the first kernel that is being launched from the host. I use Dynamic parallelism to launch successive kernels from the…

cuda nvidia thrust dynamic-parallelism

asked Oct 23 '18 at 09:43

progammer

1,951
11
28
50

Prev 1

3 4 Next