4

OpenCV has gpu::Stream class that encapsulates a queue of asynchronous calls. Some functions have overloads with the additional gpu::Stream parameter. Aside from gpu-basics-similarity.cpp sample code, there is very little information in OpenCV documentation on how and when to use gpu::Stream. For example, it is not very clear (to me) what exactly gpu::Stream::enqueueConvert or gpu::Stream::enqueueCopy do, or how to use gpu::Stream as additional overload parameter. I'm looking for some tutorial-like overview of gpu::Stream.

Alexey
  • 5,898
  • 9
  • 44
  • 81

1 Answers1

11

By default all gpu module functions are synchronous, i.e. current CPU thread is blocked until operation finishes.

gpu::Stream is a wrapper for cudaStream_t and allows to use asynchronous non-blocking call. You can also read "CUDA C Programming Guide" for detailed information about CUDA asynchronous concurrent execution.

Most gpu module functions have additional gpu::Stream parameter. If you pass non-default stream the function call will be asynchronous, and the call will be added to stream command queue.

Also gpu::Stream provides methos for asynchronous memory transfers between CPU<->GPU and GPU<->GPU. But CPU<->GPU asynchronous memory transfers works only with page-locked host memory. There is another class gpu::CudaMem that encapsulates such memory.

Currently, you may face problems if same operation is enqueued twice with different data to different streams. Some functions use the constant or texture GPU memory, and next call may update the memory before the previous one has been finished. But calling different operations asynchronously is safe because each operation has its own constant buffer. Memory copy/upload/download/set operations to the buffers you hold are also safe.

Here is small sample:

// allocate page-locked memory
CudaMem host_src_pl(768, 1024, CV_8UC1, CudaMem::ALLOC_PAGE_LOCKED);
CudaMem host_dst_pl;

// get Mat header for CudaMem (no data copy)
Mat host_src = host_src_pl;

// fill mat on CPU
someCPUFunc(host_src);

GpuMat gpu_src, gpu_dst;

// create Stream object
Stream stream;

// next calls are non-blocking

// first upload data from host
stream.enqueueUpload(host_src_pl, gpu_src);
// perform blur
blur(gpu_src, gpu_dst, Size(5,5), Point(-1,-1), stream);
// download result back to host
stream.enqueueDownload(gpu_dst, host_dst_pl);

// call another CPU function in parallel with GPU
anotherCPUFunc();

// wait GPU for finish
stream.waitForCompletion();

// now you can use GPU results
Mat host_dst = host_dst_pl;
vinograd47
  • 6,320
  • 28
  • 30
  • Thanks! So, in your example, gpu::Stream is used for function calls on GPU that are asynchronous from CPU function calls. But suppose that I have two independent functions (on GPU). Can I use two distinct gpu::Stream objects so that these functions will execute in parallel on a single GPU (similar to multi-threading)? – Alexey Jul 25 '13 at 13:58
  • when would one use multiple streams? – Alexey Jul 25 '13 at 14:46
  • Yes, you can use multiple streams. But, as I said, you may face problems if you call the same functions from different streams. – vinograd47 Jul 25 '13 at 15:01
  • How is this valid code? Wouldn't host_src_pl be empty? – Zypps987 Feb 16 '16 at 14:38
  • `stream.enqueueDownload` will allocate memory for `host_dst_pl`. – vinograd47 Feb 16 '16 at 18:03
  • Can you say how this is done in OpenCV 3.1? The members `enqueueUpload` and `enqueueDownload` do no longer exist. – bweber Mar 19 '16 at 16:35
  • See http://stackoverflow.com/questions/36104433/opencv3-where-has-cvcudastreamenqueueupload-gone – vinograd47 Mar 20 '16 at 07:52