1

I have folder with short videos and folder with images. Most of the images are sceenshots from one of the videos, but they may not be exactly the same (different size, noise, loss of details because of compression, etc). My goal is to match every image with video it was taken from. So far, I use OpenCV library to load one video and calculate SSIM score between each video frame and each image. I store the highest SSIM score of every image. Then I'd take the image with highest SSIM score, associate it with the video and run the function again for second video.

Here is my code:

import cv2
import numpy as np
from skimage.measure import compare_ssim
import sqlite3   

#screenshots - list that contains dict(id=screenshot id, image=jpeg image data)
#video_file - str - path to video file
def generate_matches(screenshots, video_file):
    for screenshot in screenshots:
            screenshot["cv_img"] = cv2.imdecode(np.fromstring(screenshot["image"], np.uint8), 0)
            screenshot["best_match"] = dict(score=0, frame=0)
            screenshot.pop('image', None) #remove jpg data from RAM

    vidcap = cv2.VideoCapture(video_file)
    success,image = vidcap.read()
    count = 1
    while success:
            image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            for screenshot in screenshots:
                    c_image = cv2.resize(image, screenshot["cv_img"].shape[1::-1])
                    score = compare_ssim(screenshot["cv_img"], c_image, full=False)
                    if score > screenshot["best_match"]["score"]:
                            screenshot["best_match"] = dict(score=score,frame=count)
            count += 1
            success,image = vidcap.read()

            if count % 500 == 0:
                    print("Frame {}".format(count))

    print("Last Frame {}".format(count))
    for screenshot in screenshots:
            c.execute("INSERT INTO matches(screenshot_id, file, match, frame) VALUE (?,?,?,?)",
                      (screenshot["id"], video_file, screenshot["best_match"]["score"], screenshot["best_match"]["frame"]))

generate_matches(list_of_screenshots, "video1.mp4")
generate_matches(list_of_screenshots, "video2.mp4")
...

This algorithm seems to be good-enough to associate videos with images, but it is quite slow, even if I'd use more threads. Is there any way to make it faster? Maybe different algorithm or some pre-processing of videos and images? I'll be glad for any ideas!

velblúd
  • 365
  • 3
  • 14
  • 1
    Rather than resizing every frame of every video once per each screenshot, wouldn't it make more sense to resize the screenshots once to match the size of the video? – Dan Mašek Dec 27 '17 at 22:15
  • @DanMašek Maybe, I will try it. Screenshots have lower resolution than video frames so I thought that using the smaller resolution will make calculation of SSIM faster. – velblúd Dec 27 '17 at 22:24
  • 2
    Why not using any perceptual-hash (img -> 128 bits for example) and then using efficient kd-trees / ball-trees or similar for searching/lookup? – sascha Dec 27 '17 at 23:00
  • @DanMašek I tested it using 10 screenshots in a list and resizing every frame was about 5x faster than resing screenshots to video resolution. (screenshot resolution - 224x450, video resolution - 720x1280) – velblúd Dec 27 '17 at 23:03
  • @sascha Wow, I had no idea that these hashes exist and I really like the way it'd work. I'll test it tommorov to see if it gives me good results. Is OpenCV's pHash suitable for this task? – velblúd Dec 27 '17 at 23:23
  • @velblúd No experience with it. I once implemented some higher-order-stats-based phash myself and it worked for my task. If somehow accuracy is all that matters, you can at least use those hashes as heuristic for a more accurate comparison (if your other approach works). – sascha Dec 27 '17 at 23:25
  • @velblúd OK, I didn't realize the screenshots were that much smaller. In that case it does make sense (and it seems the comparison is the more expensive step). If some of the screenshots are the same size, you could still cut some time by grouping them and doing one resize per size. – Dan Mašek Dec 28 '17 at 00:13
  • Example screenshots and videos screenshots are needed. As they are not same size, but almost the same scene, features(like SIFT/SURF/ORB) extracting and matching may help. – Kinght 金 Dec 28 '17 at 01:52

1 Answers1

0

Based on sascha's suggestion, I calculated dhashes (source) of all frames in videos and dhashes of all screenshosts, and compared them using Hamming distance (source).

def dhash(image, hashSize=16): #hashSize=16 worked best for me
    # resize the input image, adding a single column (width) so we
    # can compute the horizontal gradient
    resized = cv2.resize(image, (hashSize + 1, hashSize))

    # compute the (relative) horizontal gradient between adjacent
    # column pixels
    diff = resized[:, 1:] > resized[:, :-1]

    # convert the difference image to a hash
    return sum([2 ** i for (i, v) in enumerate(diff.flatten()) if v])

def hamming(a, b):
        return bin(a^b).count('1')

This solution is fast and enough precise for my needs. The results would be most likely improved if I used different hashing function (e.g. OpenCV's pHash), but I couldn't find them in OpenCV python biding.

velblúd
  • 365
  • 3
  • 14