cudaMemcpy image data with conversion from HWC to CHW

Question

Imagine that we have some OpenCV usual image Mat:

cv::Mat usual_image = cv::imread(...)

This image stored as HWC/NHWC array in memory.

If that possible to copy this image as CHW/NCHW (separated channel arrays, in sense) into cuda memory without super high cost cv::split?

Just for visualization HWC and CHW:

probably. `cv::split` is cpu-side, hence slow. afaik there's `cv::cuda::split` (device-side), which should be a lot cheaper, and there's probably ways to do the conversion *during* transfer too. some related recipes: https://stackoverflow.com/questions/41637162/how-cvmat-convert-to-nhcw-format and https://answers.opencv.org/question/191013/dnn-loading-of-models-and-order-of-channels/ -- when you use opencv's `dnn` module, it has various "blob" methods that should take care of this — Christoph Rackwitz, Dec 30 '21 at 11:30
If you have pre-allocated memory where you want to write, cv::cuda::split will be even worse than cpu version. Because there will exists intermediate cudamemalloc, which cost is huge. — SofaScience, Dec 30 '21 at 11:41

score 1 · Answer 1 · answered Dec 30 '21 at 15:06

1

The fastest way to do this will be to copy the image as-is to the GPU, then write a GPU kernel to split the data into 3 buffers.

A slower alternative would be to use 3 calls to cudaMemcpy2D to copy the data from host to device, one call per plane.

answered Dec 30 '21 at 15:06

Robert Crovella

1 Answers1