Highest Voted 'cuda-streams' Questions

0

votes

1 answer

Using constant memory with MPI and streams

If I have a __constant__ value __constant__ float constVal; Which may or may not be initialized by MPI ranks on non-blocking streams: cudaMemcpyToSymbolAsync((void*)&constVal,deviceValue,sizeof(float),0,cudaMemcpyDeviceToDevice,stream); Is…

cuda mpi cuda-streams

asked Feb 16 '21 at 22:56

Jacob Faib

1,062
7
22

0

votes

1 answer

CUDA cudaMemcpyAsync using single stream to host

I have a single kernel which is feeling data to two parameters (dev_out_1 and dev_out_2) using single stream. I wanted to copy back the data from the device to host in parallel. my requirement is to use single stream and copy back to the host in…

cuda cuda-streams

asked Feb 07 '21 at 13:59

Yona

25
2
6

0

votes

1 answer

CUDA C++ overlapping SERIAL kernel execution and data transfer

So this guide here shows the general way to overlap kernel execution and data transfer. cudaStream_t streams[nStreams]; for (int i = 0; i < nStreams; ++i) { cudaStreamCreate(&streams[i]); int offset = ...; cudaMemcpyAsync(&d_a[offset],…

c++ memory cuda transfer cuda-streams

asked Aug 15 '20 at 01:34

Duke Le

332
3
14

0

votes

1 answer

Is it possible to manually set the SMs used for one CUDA stream?

By default, the kernel will use all available SMs of the device (if enough blocks). However, now I have 2 stream with one computational-intense and one memory-intense, and I want to limit the maximal SMs used for 2 stream respectively (after setting…

cuda nvidia cudnn cuda-streams

asked Jun 23 '20 at 07:36

Subject_No_i

33
2

0

votes

1 answer

Why could OpenCV wait for a stream-ed CUDA operation instead of proceeding asynchronously?

I'm trying to perform some image dilation using OpenCV & CUDA. I invoke two calls to filter->apply(...) with a different filter object and on a different Mat, after each other, every time specifying a different stream to work with. They DO get…

opencv cuda-streams

asked May 21 '20 at 17:38

BIOStheZerg

396
4
19

0

votes

1 answer

Overlapping transfers and kernel executions in CUDA with two loops

I want to overlap data transfers and kernel executions in a form like this: int numStreams = 3; int size = 10; for(int i = 0; i < size; i++) { cuMemcpyHtoDAsync( _bufferIn1, _host_memoryIn1 ), …

cuda overlap cuda-streams

asked Apr 16 '20 at 21:14

Eagle06

71
1
7

0

votes

1 answer

CUDA graph stream capture with thrust::reduce

When I am trying to capture stream execution to build CUDA graph, call to thrust::reduce causes a runtime error cudaErrorStreamCaptureUnsupported: operation not permitted when stream is capturing. I have tried returning the reduction result to both…

cuda thrust cuda-streams cuda-graphs

asked Apr 01 '20 at 12:00

Cos_ma

75
9

0

votes

1 answer

CUDA global atomic operations across concurrent kernel executions

My CUDA application performs an associative reduction over a volume. Essentially each thread computes values which are atomically added to overlapping locations of the same output buffer in global memory. Is it possible to concurrently launch this…

cuda atomic cuda-streams gpu-atomics

asked Aug 10 '19 at 02:35

AnimatedRNG

1,859
3
26
39

0

votes

0 answers

Using cv::cuda::stream for asynchronous processing of images in opencv

I am using OpenCV 3.4 with cuda libraries to process video images. Image is grabbed and uploaded over the device using GpuMat::upload(). Afterward the image is thresholded twice to create 2 different binary images (Th1 and Th2). My first question is…

c++ opencv parallel-processing cuda-streams

asked Jan 17 '19 at 10:44

Ali Nouri

67
7

0

votes

1 answer

Is cuStreamAddCallback as effective as cuStreamSynchronize in having latest bits of data on host?

In CUDA(driver API) documentation, it says The start of execution of a callback has the same effect as synchronizing an event recorded in the same stream immediately prior to the callback. It thus synchronizes streams which have been "joined" …

callback cuda cuda-streams

asked Feb 25 '18 at 17:29

huseyin tugrul buyukisik

11,469
4
45
97

0

votes

1 answer

Asynchronous behavior of CUDA events within a CUDA stream

This question is about notion of a CUDA stream (Stream) and the apparent anomaly with CUDA events (Event) recorded on a stream. Consider the following code demonstrating this anamoly, cudaEventRecord(eventStart, stream1) kernel1<<<...,…

cuda cuda-streams cuda-events

asked Dec 01 '17 at 06:13

kesari

536
1
6
16

0

votes

1 answer

Enqueueing an async copy from a CUDA callback - not permitted?

This program: #include #include struct buffers_t { void* host_buffer; void* device_buffer; }; void ensure_no_error(std::string message) { auto status = cudaGetLastError(); if (status != cudaSuccess) { …

asynchronous cuda cuda-streams

asked Nov 01 '17 at 09:14

einpoklum

118,144
57
340
684

0

votes

1 answer

CUDA streams performance

I am currently learning CUDA streams through the computation of a dot product between two vectors. The ingredients are a kernel function that takes in vectors x and y and returns a vector result of size equal to the number of blocks, where each…

cuda dot-product cuda-streams

asked Nov 12 '16 at 18:29

iNvId

1

0

votes

1 answer

Kernel invoking delay on CUDA with Streams

I have created the Scan Algorithm for CUDA from scratch and trying to use it for smaller data amounts less than 80,000 bytes. Two separate instances were created where, one runs the kernels using streams where possible and the other runs only in the…

cuda cuda-streams

asked Jun 08 '16 at 11:13

BAdhi

420
7
19

0

votes

1 answer

Multiple kernel calls in CUDA

I'm trying to call the same kernel on CUDA (with one different input parameter) more times, but it executes only the first one and doesn't follow with other kernel calls. Assume the inputs arrays are new_value0=[123.814935276; 234; 100; 166;…

c parallel-processing cuda cuda-streams

asked Nov 06 '15 at 10:20

adry_b89

43
9

Questions tagged [cuda-streams]