5

I am creating a program to stabilize the video stream. At the moment, my program works based on the phase correlation algorithm. I'm calculating an offset between two images - base and current. Next I correct the current image according to the new coordinates. This program works, but the result is not satisfactory. The related links you may find that the treated video appears undesirable and shake the whole video is becoming worse.
Orininal video
Unshaked video
There is my current realisation:
Calculating offset between images:

Point2d calculate_offset_phase_optimized(Mat one, Mat& two) {

  if(two.type() != CV_64F) {
    cvtColor(two, two, CV_BGR2GRAY);
    two.convertTo(two, CV_64F);
  }

  cvtColor(one, one, CV_BGR2GRAY);
  one.convertTo(one, CV_64F);

  return phaseCorrelate(one, two);

}

Shifting image according this coordinate:

void move_image_roi_alt(Mat& img, Mat& trans, const Point2d& offset) {

  trans = Mat::zeros(img.size(), img.type());
  img(
    Rect(
        _0(static_cast<int>(offset.x)),
        _0(static_cast<int>(offset.y)),
        img.cols-abs(static_cast<int>(offset.x)),
        img.rows-abs(static_cast<int>(offset.y))
    )
  ).copyTo(trans(
    Rect(
        _0ia(static_cast<int>(offset.x)),
        _0ia(static_cast<int>(offset.y)),
        img.cols-abs(static_cast<int>(offset.x)), 
        img.rows-abs(static_cast<int>(offset.y))
    )   
  )); 
}

int _0(const int x) {
  return x < 0 ? 0 : x;
}

int _0ia(const int x) {
  return x < 0 ? abs(x) : 0;
}

I was looking through the document authors stabilizer YouTube and algorithm based on corner detection seemed attractive, but I'm not entirely clear how it works. So my question is how to effectively solve this problem. One of the conditions - the program will run on slower computers, so heavy algorithms may not be suitable.
Thanks!
P.S. I apologize for any mistakes in the text - it is an automatic translation.

iRomul
  • 251
  • 1
  • 7
  • 18
  • What kind of videos are you targeting? Just artificial images (where the scene is in fact a plane) or real videos where pixels may be at different depth? And what movements do you want to correct? Smooth motions are mostly desired, but movements with a high accelerations are usually noise. – Nico Schertler May 20 '14 at 21:59
  • There is example of target video: http://www.youtube.com/watch?v=Ta8w_nzuMkU And my result of my current stabilizer: http://www.youtube.com/watch?v=-0p-uJEacVI The highest priority is eliminating shakes in planar camera movement. Rotation and scale are optional. – iRomul May 20 '14 at 23:22
  • 1
    I could imagine that the scene's depth could be a real problem (far pixels won't move as much as near pixels). I have no idea how this is usually done, but here is how I would do it: Estimate the 3D position of each point using two or more images. Estimate the 3D camera movement, too. Smooth the camera path (e.g. using a box-filter) and re-render the scene, filling any holes that might come up. I am not sure if pure translations will be enough. – Nico Schertler May 21 '14 at 07:31

2 Answers2

3

You can use image descriptors such as SIFT in each frame and calculate robust matches between the frames. Then you can calculate homography between the frames and use that to align them. Using sparse features can lead to faster implementation than using a dense correlation.

Alternately, if you know the camera parameters you can calculate 3D positions of the points and of the cameras and reproject the images onto a stable projection plane. In the result, you also get a sparse 3D reconstruction of the scene (somewhat imprecise, usually it needs to be optimized to be usable). This is what e.g. Autostitch would do, but it is quite difficult to implement, however.

Note that the camera parameters can also be calculated, but that is even more difficult.

the swine
  • 10,713
  • 7
  • 58
  • 100
  • Thanks for your answer! I am new in this sphere, so I encountered some misunderstanding some things. There is my code: `detector.detect(base_frame, keypoints_base); detector.detect(current_frame, keypoints_cur); extractor.compute(..., keypoints_base, descriptors_base); extractor.compute(..., keypoints_cur, descriptors_cur); matcher.knnMatch(description_base, description_cur, matches, 2);` Than i am filtering best matches and do this: `homography = findHomography(k_base, k_cur, CV_RANSAC);` But what's next step i should to do? warpAffine does not take this homography 3x3, only 2x3. – iRomul May 21 '14 at 10:17
  • Use `warpPerspective` instead (see http://docs.opencv.org/modules/imgproc/doc/geometric_transformations.html#warpperspective). – the swine May 21 '14 at 10:37
  • I tried to implement this algorithm, but the results were unsatisfactory: [video](http://www.youtube.com/watch?v=DUZ7wDplDHQ&feature=youtu.be) I remind you that the frames are compared with fixed frame. In this case, it is the first video frame. [Full code](http://pastebin.com/9rNeXf2t) – iRomul May 21 '14 at 11:16
  • Well, if your video is panning away from the first frame, then you cannot possibly use it as a reference, because it will not be possible to calculate the transformation when there are no common features between the frames (which evidently happens in your video). You would need to use a floating reference. – the swine May 21 '14 at 11:24
  • 1
    A floating reference would be implemented by calculating the transformation between the consecutive frames and stacking the transformation. At the same time, at every step, the transformation stack is interpolated with identity matrix, so that the view does not go away with time. That should create a smooth movement. – the swine May 21 '14 at 11:28
  • BTW we successfully used this method to stabilize internal view of an aircraft cabin in order to be able to detect pilot pose / read out values of the dashboard indicators (see https://www.youtube.com/watch?v=pkHPm_8MK_I). There is a thesis on that, but it is unfortunately in Czech language. – the swine May 21 '14 at 11:31
  • Thank you! Unfortunately, this method is quite complicated for me, but I will try to implement it. And I would like to ask a final question - if you compile a project in Visual Studio 2012 using Debug mode, the program works extremely slow. Are there any factors that affect performance? – iRomul May 21 '14 at 11:40
  • Yes, it is unfortunately not simple, sorry. Don't ask me, ask SO: http://stackoverflow.com/questions/11564267/why-is-debug-mode-slower-than-release-in-vs :). – the swine May 21 '14 at 11:51
2

OpenCV can do it for you in 3 lines of code (it is definitely shortest way, may be even the best):

t = estimateRigidTransform(newFrame, referenceFrame, 0); // 0 means not all transformations (5 of 6)
if(!t.empty()){    
    warpAffine(newFrame, stableFrame, t, Size(newFrame.cols, newFrame.rows)); // stableFrame should be stable now
}

You can turn off some kind of transformations by modifying matrix t, it can lead to more stable result. It is just core idea, then you can modify it in the way you want: change referenceFrame, smooth set of transformation parameters from matrix t etc.

Vit
  • 793
  • 4
  • 17