OpenCV converts BGR images to grayscale using the linear transformation Y = 0.299R + 0.587G + 0.114B
, according to their documentation.
I tried to mimic it using NumPy, by multiplying the HxWx3
BGR matrix by the 3x1
vector of coefficients [0.114, 0.587, 0.299]'
, a multiplication that should result in a HxWx1
grayscale image matrix.
The NumPy code is as follows:
import cv2
import numpy as np
import time
im = cv2.imread(IM_PATHS[0], cv2.IMREAD_COLOR)
# Prepare destination grayscale memory
dst = np.zeros(im.shape[:2], dtype = np.uint8)
# BGR -> Grayscale projection column vector
bgr_weight_arr = np.array((0.114,0.587,0.299), dtype = np.float32).reshape(3,1)
for im_path in IM_PATHS:
im = cv2.imread(im_path , cv2.IMREAD_COLOR)
t1 = time.time()
# NumPy multiplication comes here
dst[:,:] = (im @ bgr_weight_arr).reshape(*dst.shape)
t2 = time.time()
print(f'runtime: {(t2-t1):.3f}sec')
Using 12MP images (4000x3000 pixels), the above NumPy-powered process typically takes around 90ms
per image, and that is without rounding the multiplication results.
On the other hand, when I replace the matrix multiplication part by OpenCV's function: dst[:,:] = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
, the typical runtime I get is around 5ms
per image. I.e., 18X faster!
Can anyone explain how is that possible? I have always been taught to believe that NumPy uses all available acceleration techniques, such as SIMD. So how can OpenCV get so dramatically faster?
Update:
Even when using quantized multiplications, NumPy's runtimes stay at the same range, around 90ms
...
rgb_weight_arr_uint16 = np.round(256 * np.array((0.114,0.587,0.299))).astype('uint16').reshape(3,1)
for im_path in IM_PATHS:
im = cv2.imread(im_path , cv2.IMREAD_COLOR)
t1 = time.time()
# NumPy multiplication comes here
dst[:,:] = np.right_shift(im @ bgr_weight_arr_uint16, 8).reshape(*dst.shape)
t2 = time.time()
print(f'runtime: {(t2-t1):.3f}sec')