1

This sounds like an easy task, but I already spent hours on it. There're several posts with a similar headline, so let me describe my problem first. I have H264 encoded video files, those files show records of a colonoscopy/gastroscopy.

During the examination, the exterminator can make some kind of screenshot. You can see this in the video because for round about one second the image is not moving, so a couple of frames show the "same". I'd like to know when those screenshots are made.

So in the first place I extracted the image of the video:

ffmpeg -hwaccel cuvid -c:v h264_cuvid -i '/myVideo.mkv' -vf "hwdownload,format=nv12" -start_number 0 -vsync vfr -q:v 1 '/myFrames/%05d.jpg'

This works just fine and the result is a folder with all the images in a high quality. The idea now is to compare image x and image x+1 (or + y) and see if they are the "same" and if so, a screenshot was taken. If I take a look for those images, the images look really the same, I can not tell the difference, but the computer can.

Since those image have been compressed/encoded they have a loss. I guess depending of the key frame in the video encoding process, the difference between those "identical" images is sometimes 0 and sometimes "huge". So far the problem, time for a little code:

//  init mPrev with last element
cv::Mat mPrev = cv::imread(imagePaths[imagePaths.size() - 1])(*rect).clone();
cvtColor(mPrev, mPrev, cv::COLOR_BGR2GRAY);
//  remove smaller noise 
cv::medianBlur(mPrev, mPrev, 5);
//  create binary image, shows only light reflection (landmarks) everything else is to dark
mPrev = mPrev.setTo(0, mPrev < max);
mPrev = mPrev.setTo(255, mPrev >= max);

cv::Mat diff;

std::vector<int> screenShotVec;
for (int k = start; k < end; k++) {
    cv::Mat mat = cv::imread(imagePaths[k])(*rect).clone();
    cvtColor(mat, mat, cv::COLOR_BGR2GRAY);
    //  remove smaller noise 
    cv::medianBlur(mat, mat, medSize);
    //  create binary image, shows only light reflection (landmarks) everything else is to dark
    mat = mat.setTo(0, mat < max);
    mat = mat.setTo(255, mat >= max);

    
    double d = cv::sum(mat)[0];

    //  if image is totally black, it is not a screenshot, since parts of interest always have light reflection
    if (d > 0) {
        //  get difference of binary images
        absdiff(mPrev, mat, diff);
        // differences should be very small and easy to remove with median blur
        cv::medianBlur(diff, diff, 9);
        d = cv::sum(diff)[0];

        // no difference, it is a screenshot
        if (d == 0) {
            screenShotVec.push_back(k);
        }
    }
    //clone the mat for the next round
    mPrev = mat.clone();
}

}

This was the code which worked the best so far. But it is not very stable, I have videos from many different endoscope processors, cameras and grabbers. So I have to adjust it every time. For example this is the result if I "cv::substract" two pseudo identically frames: enter image description here

while this is the result if I "cv::substract" two frames with very small movement of the camera: enter image description here

The most of the time the camera is moving very fast and since I compare frame x with frame x+y (y >= 5) the differences are more obvious, the problem is when the camera is not moving fast. In addition to cv::substract I tried several kernel sizes for median, I tried to detect only edges with canny, and compared those or use the cv::norm function.

Does anyone has a recommendation, how I can transform or measure those pseudo identical frames to something which is identically while frames which are showing real changes remain distinguishable?

p.s. I am sorry I can't post the real images, since this is medical data

silverfox
  • 1,568
  • 10
  • 27
user2267367
  • 704
  • 7
  • 19
  • what do you think about the compare the histograms? it might be useful but I`m not sure about the histogram results when the camera is not moving fast. – badcode Jun 08 '21 at 23:04
  • Hi @badcode, the histogram is a fast method, but sadly it looks very similar when the image is not moving. – user2267367 Jun 11 '21 at 05:18

1 Answers1

2

After several tests I found finally something which works for me. The discussion was already in 2013 here on stackoverflow, feature matching. There are several matching algorithms available in opencv. I selected as basis the code of this tutorial. I made a few changes and this is the result (OpenCv 4.5.2):

#include <string.h>
#include <opencv2/core/mat.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/features2d.hpp>
#include <opencv2/xfeatures2d.hpp>
#include <opencv2/imgproc.hpp>

using namespace std;
using namespace cv;
using namespace cv::xfeatures2d;

void Detector::run(vector<string> imagePaths) {

    Mat mPrev = cv::imread(imagePaths[imagePaths.size() - 1]);
    cvtColor(mPrev, mPrev, cv::COLOR_BGR2GRAY);
    cv::medianBlur(mPrev, mPrev, 5);

    for (int k = start; k < end; k++) {
        Mat mat = cv::imread(imagePaths[k]);
        cvtColor(mat, mat, cv::COLOR_BGR2GRAY);
        medianBlur(mat, mat, 5);

        if (areImageFeaturesTheSame(mPrev, mat)) {
            //yes it is the same image
        }

    mPrev = mat.clone();
}

bool Detector::areImageFeaturesTheSame(cv::Mat img1, cv::Mat img2) {

    //threshold to check if the left img 
    //feature coordinates are almost the same like the right one
    const float xyThreshold = 0.1;

    //-- Step 1: Detect the keypoints using SURF Detector, compute the descriptors
    int minHessian = 800; //smaller value finds more and bigger value less features
    Ptr<SURF> detector = SURF::create(minHessian);
    std::vector<KeyPoint> keypoints1, keypoints2;
    Mat descriptors1, descriptors2;

    detector->detectAndCompute(img1, noArray(), keypoints1, descriptors1);
    detector->detectAndCompute(img2, noArray(), keypoints2, descriptors2);

    //-- Step 2: Matching descriptor vectors with a FLANN based matcher
    // Since SURF is a floating-point descriptor NORM_L2 is used
    Ptr<cv::DescriptorMatcher> matcher = cv::DescriptorMatcher::create(cv::DescriptorMatcher::FLANNBASED);
    std::vector<DMatch> knn_matches;
    
    //no features images is not a screenshot
    if(descriptors1.empty() || descriptors2.empty()){
        return false; 
    }

    matcher->match(descriptors1, descriptors2, knn_matches, 2);
    //-- using a threshold for the distance of a match
    const float distanceThresh = 0.5;

    int matchCount = 0;

    for (size_t i = 0; i < knn_matches.size(); i++) {
        //check if distance of match is small. check if x and y in left are right image almost the same, keep in mind that a feature can be slightly different even in an image which looks the same for an human
        if (knn_matches[i].distance < distanceThresh &&
            abs(keypoints1[knn_matches[i].queryIdx].pt.x - keypoints2[knn_matches[i].trainIdx].pt.x) < xyThreshold &&
            abs(keypoints1[knn_matches[i].queryIdx].pt.y - keypoints2[knn_matches[i].trainIdx].pt.y) < xyThreshold) {
            matchCount++;
        }
    }

    //have at least 18 of those features, this works fine for me
    //but depending on the image you may need another value here
    return matchCount > 18;
}

Remember the most of the code is from here. And this is the idea, I try to look if the features are the same and on the same position, if this is true I consider it as the same image. Since I have extracted frames of a video, those are very similar from one to each other. So I compare only every x (10th) images. This makes the differences more obvious. Here an example of two images which are the same, I bumped up minHessian to 20000, so there are not to many features:

enter image description here

You can see all matched features are left and right at the same position, the lines are straight. And here you can see how the feature matching looks if there are 20 frames difference: enter image description here You can see that the most of the matched feature lines are slightly going up/down. If I would put the images in a vertical line, you would maybe see the same on the y-axis for left/right. Some of the features which were matched are on a totally different location, just because another feature on the new right image looks very similar, in my code the if with xyThreshold takes care for all of this, since the value is so small 0.1 the matched feature must be left and right on the same location.

One downside, the whole thing is relatively slow if you iterate over an entire video. I consider training an AI for that. I already tried the opencv cuda version of the matcher, it was not really faster for my specific case.

user2267367
  • 704
  • 7
  • 19