I'm doing live video processing on an ODROID XU4 (Samsung Exynos5422 with Mali-T628 GPU) using OpenCV 4.1 and Python 3.6. I'm able to use the GPU by converting the Numpy arrays containing my images to UMat, e.g:
img_umat = cv2.UMat(img_array)
This done, the image processing code runs faster than it does on the CPU; however the transfer to/from the GPU takes a long time (~0.03 seconds in some cases). Is there any way around this?
I am new to GPU programming and have been scratching my head over section 8.3 here. I don't know how that default "cv2.UMat(array)" initializer is allocating memory, so I've tried to specify it, e.g.
host_mat = cv2.UMat(mat,cv2.USAGE_ALLOCATE_HOST_MEMORY)
But when I do this, no error is thrown and host_mat is empty. Am I doing something wrong, or am I completely on the wrong path? Any suggestions appreciated.