Python OpenCV real time image stitching (n = 5) performance optimization

Question

I need to stitch five video streams on the fly. The cameras recording the videos are mounted onto a rack side by side and will never change their relative position to one another. Homography matrices are therefore static.

I'm following the approach from this github repo:

Starting from the center image, you first stitch to the left, and then stitch the remaining images to the right.

The code from that repo works, but it is painfully slow. I've been able to dramatically improve its performance (factor 300), but it still takes 0.25 seconds to stitch a panorama of five images (on a 2015 Macbook Pro).

The slow part: Applying each result of cv2.warpPerspective(...) to the images stitched up to that point. I'm currently doing this by using the alpha channel and blending the two images, inspired by this SO answer. It is this blending that makes the stitching slow.

(Pseudo) code:

def blend_transparent(background, foreground):
    overlay_img = foreground[:, :, :3]  # Grab the BRG planes 
    overlay_mask = foreground[:, :, 3:]  # And the alpha plane

    # Again calculate the inverse mask
    background_mask = 255 - overlay_mask

    # Turn the masks into three channel, so we can use them as weights
    overlay_mask = cv2.cvtColor(overlay_mask, cv2.COLOR_GRAY2BGR)
    background_mask = cv2.cvtColor(background_mask, cv2.COLOR_GRAY2BGR)

    # Create a masked out background image, and masked out overlay
    # We convert the images to floating point in range 0.0 - 1.0
    background_part = (background * (1 / 255.0)) * (background_mask * (1 / 255.0))
    overlay_part = (overlay_img * (1 / 255.0)) * (overlay_mask * (1 / 255.0))

    # And finally just add them together, and rescale it back to an 8bit integer image
    return np.uint8(
        cv2.addWeighted(background_part, 255.0, overlay_part, 255.0, 0.0)
    )


for image in right_images:
    warped_image = cv2.warpPerspective(image, ...)
    mask = np.zeros(
        (warped_image.shape[0], warped_image.shape[1], 4), 
        dtype="uint8"
    )
    mask[0 : previously_stitched.shape[0], 0 : previously_stitched.shape[1]] = previously_stitched
    mask_rgb = mask[:, :, :3]  # Grab the BRG planes
    previously_stitched = blend_transparent(mask_rgb, warped_image)

So my question is: Is there a way to apply the warped image to the existing panorama in a more efficient way?

My full working code is in this repository.

Disclaimer: I'm a web developer and my knowledge on computer vision is very limited.

It is not clear for me which part is taking time. Is it the call of cv2.warpPerspective(...) or the function blend_transparent ? In your project you are using a patented algorithm which is not included by default in python packages. This does not help to test your issue. — T.Lucas, Jan 10 '19 at 10:38
The blending step is the time consuming one. I updated the question. Regarding the patents: If you install python-opencv in the version pinned in `requirements.txt`, the algorithm in question is included. — creimers, Jan 10 '19 at 10:48

score 5 · Accepted Answer · answered Jan 10 '19 at 13:08

Alpha channel is useful when your image has transparency, but here you manually add an alpha channel by converting. This channel could be used to store computation but I think you would lose performance. I suggest the following function for blend_transparent:

def blend_transparent(self, background, foreground):
    # Split out the transparency mask from the colour info
    overlay_img = foreground[:, :, :3]  # Grab the BRG planes

    res = background

    only_right = np.nonzero((np.sum(overlay_img, 2) != 0) * (np.sum(background,2) == 0))
    left_and_right = np.nonzero((np.sum(overlay_img, 2) != 0) * (np.sum(background,2) != 0))

    res[only_right] = overlay_img[only_right]
    res[left_and_right] = res[left_and_right]*0.5 + overlay_img[left_and_right]*0.5
    return res

Here you set in the result the value of right image pixels if no value is currently set. If a value is already set, then you compute the mean of values from left and right. The computation time is divided by a factor of 1.6.

Since your projection is frozen, it is not needed to compute indices only_right and left_and_right each time, we can compute them once and store them. Do this and you should divide the computation time by a factor of 4.

This is clever! Caching the indices brought down computation time to 0.05 seconds. — creimers, Jan 10 '19 at 14:27

Python OpenCV real time image stitching (n = 5) performance optimization

1 Answers1