0

I am using Lucas-Kanade Optical Flow Method to track the points from one image to the next one. In OpenCV, there is an easy to use function:

void cv::calcOpticalFlowPyrLK (
    InputArray       prevImg,
    InputArray       nextImg,
    InputArray       prevPts,
    InputOutputArray nextPts,
    OutputArray      status,
    OutputArray      err,
    Size             winSize = Size(21, 21),
    int              maxLevel = 3,
    TermCriteria     criteria = TermCriteria(TermCriteria::COUNT+TermCriteria::EPS, 30, 0.01),
    int              flags = 0,
    double           minEigThreshold = 1e-4 
)

After I got the tracked points in nextPts, I iterated through all the point pairs and calculated the Euclidean distance:

    diff_x = prev_pt.x - curr_pt.x;
    diff_y = prev_pt.y - curr_pt.y;
    euclidean_dist = sqrt(pow(diff_x, 2) + pow(diff_y, 2));

During the iterations, I also kept the track of maximum distance and to my surprise it turned out to be as high as around 500 px!! How's that even possible!? Because the size of the search window at each pyramid level i.e. winSize is set to cv::Size(21, 21). So, the maximum distant point pairs will look like this:

Lucas-Kanade Opt Flow - Search Window

Hence, from my understanding, for winSize = cv::Size(21, 21), the maximum distance cannot go beyond 14.14 px, right?

(Here is how I got 14.14 number):

Pi = cv::Point(10, 10)
Pj = cv::Point(20,  0)
diff_x = 10
diff_y = 10
euclidean_dist = sqrt(10^2 + 10^2) 
               = sqrt(200) 
               = 14.14

So, I don't understand why I'm getting the maximum Euclidean distance between tracked point pairs as high as around 500 px.

As per the OpenCV documentation, the parameter description for winSize says:

winSize - size of the search window at 'each' pyramid level.

Does that mean winSize remains constant even when the image is down-scaled in the pyramid? Because of that, some point pairs might be within the search window at upper levels of the pyramid?

Milan
  • 1,743
  • 2
  • 13
  • 36
  • 1
    Yes sure, the pyramid is used to extend the search region in an efficient way. Still, with 3 (or 4?) levels and naive binary pyramid, it would lead to a maximum distance of 113. But I don't know whether the pyramid has a bigger subdivision factor or it is handed-over on the coarsest level. – Micka Jun 22 '23 at 14:31
  • 2
    brightness constancy assumption underlies the algorithm. practically, numerics ruin that a little. LK doesn't just match a window. it uses a difference/gradient. that can send you far away if the numerics make it happen. It's somewhat like numerical root finding. I'd suggest setting up an 1D toy example, a signal you can manipulate interactively, and implementing the numerics for yourself on that. – Christoph Rackwitz Jun 22 '23 at 14:39
  • @Micka I don't think I completely understood your statement: _"But I don't know whether the pyramid has a bigger subdivision factor or it is handed-over on the coarsest level."_ But I did try playing with the `maxLevel` parameter and even with its value = `0` (i.e. pyramids are not used -- single level), the max Euclidean Distance was nowhere close to `14.14`. It was still in the range of `500`s! – Milan Jun 23 '23 at 17:55
  • @Micka Also, in your statement, _"Still, with 3 (or 4?) levels and naive binary pyramid, it would lead to a maximum distance of 113"_, how did you come up with the number `113`? As I have explained/thought in my question, the max Euclidean distance cannot/should not be more than `14.14` px. – Milan Jun 23 '23 at 17:58
  • @ChristophRackwitz I appreciate your suggestion. However, to implement that, I will have to dig way deeper into the Lucas Kanade algorithm. That's why I thought someone with more experience may have some good idea/insight about what's going on and why I am getting such a high maximum Euclidean distance. TIA! – Milan Jun 23 '23 at 18:02
  • 1
    14.14*2*2*2 for 3 binary subdivisions. – Micka Jun 23 '23 at 21:20

0 Answers0