3

Imagine that we have some OpenCV usual image Mat:

cv::Mat usual_image = cv::imread(...)

This image stored as HWC/NHWC array in memory.

If that possible to copy this image as CHW/NCHW (separated channel arrays, in sense) into cuda memory without super high cost cv::split?

Just for visualization HWC and CHW: Just for visualization

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
  • 1
    probably. `cv::split` is cpu-side, hence slow. afaik there's `cv::cuda::split` (device-side), which should be a lot cheaper, and there's probably ways to do the conversion *during* transfer too. some related recipes: https://stackoverflow.com/questions/41637162/how-cvmat-convert-to-nhcw-format and https://answers.opencv.org/question/191013/dnn-loading-of-models-and-order-of-channels/ -- when you use opencv's `dnn` module, it has various "blob" methods that should take care of this – Christoph Rackwitz Dec 30 '21 at 11:30
  • If you have pre-allocated memory where you want to write, cv::cuda::split will be even worse than cpu version. Because there will exists intermediate cudamemalloc, which cost is huge. – SofaScience Dec 30 '21 at 11:41

1 Answers1

1

The fastest way to do this will be to copy the image as-is to the GPU, then write a GPU kernel to split the data into 3 buffers.

A slower alternative would be to use 3 calls to cudaMemcpy2D to copy the data from host to device, one call per plane.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257