OpenCV utilizes multiple matrix multiply on GPU

Question

I'm using OpenCV for an application in image processing. I'd like to accelerate a lot of (a great many times) matrix operations (matrices are fairly large) on GPU and want to avoid coding directly in CUDA C, if possible. OpenCV 3.4.2 has a number of GPU accelerated functions such as cuda::multiply. But it accelerates 'only one' matrix operation. So when I have lots of matrix operations, it will be time-consuming.

My code is described below. With CPU parallel utilizing the GPU functions, the GPU usage is lower than 5%. So I wonder if there is any way to improve? Is there any method for calling GPU functions in GPU parallel?

cuda::Stream stream;
const int size        = 3427680;
const int iteration   =  649319;

cv::Mat cpu_mat1 = cv::Mat(1, size, CV_32FC4, Scalar(1));
cv::Mat cpu_mat2 = cv::Mat(1, size, CV_32FC4, Scalar(1));
cv::Mat cpu_mat3 = cv::Mat::zeros(1, size, CV_32FC4);

cv::cuda::GpuMat gpu_mat1;
gpu_mat1.upload(cpu_mat1);
cv::cuda::GpuMat gpu_mat2;
gpu_mat2.upload(cpu_mat2);
cv::cuda::GpuMat gpu_mat3;
gpu_mat3.upload(cpu_mat3);

#pragma omp parallel for
for(i=0;i<iteration;i++)
{
cuda::multiply(gpu_mat1, gpu_mat2, gpu_mat3, 1.0, -1, stream);
cuda::sum(gpu_mat3);
}

You need to use a CUDA [stream](https://stackoverflow.com/questions/17842827/how-to-use-gpustream-in-opencv) for async operations, currently all your calls are blocking — EdChum, Aug 15 '18 at 13:36
@EdChum Actually I have used CUDA stream for async operations as the code shown above. Is that right? Or I'm using it wrong? Thank you for your guidance! — 刘定坤, Aug 15 '18 at 14:08
You're supposed to `enque` on the stream, and then call `waitforcompletion` see related: https://stackoverflow.com/questions/17842827/how-to-use-gpustream-in-opencv — EdChum, Aug 15 '18 at 14:23
@EdChum the SO thread is no longer valid. You dont have to enque the tasks to stream anymore in newer versions of OCV — Croolman, Aug 20 '18 at 05:22

OpenCV utilizes multiple matrix multiply on GPU

0 Answers0