How to 3d reconstruct robustly from multiple images with known poses in OpenCV

Question

The traditional solution for high resolution images examples :

extract features (dense) for all images
match features to find tracks through images
triangulate features to 3d points.

I can give two problem here for my case (many 640*480 images with small movements between each others) , first: matching is very slow , especially if the number of images is big, so a better solution can be optical flow tracking.., but it's getting sparse with big moves, ( a mix could solve the problem !!)

second: triangulate tracks , though it is over-determined problem, I find it hard to code a solution, .. (here am asking for simplifying what I read in references )

I searched quite a bit for libraries in that direction, with no useful result.

again, I have ground truth camera matrices and need only 3d positions as first estimate (without BA), A coded software solution can be of great help as I don't need to reinvent the wheel, though a detailed instructions maybe helpful

I'm not qualified to answer, but check out this tutorial, maybe it gives you some ideas: http://www.morethantechnical.com/2012/02/07/structure-from-motion-and-3d-reconstruction-on-the-easy-in-opencv-2-3-w-code/ — reden, Aug 27 '18 at 07:27
I think any problem here can be solved by higher resolution of the image, .. but this is not always available, .. — Yasin Yousif, Aug 29 '18 at 05:32

score 5 · Answer 1 · answered Oct 18 '19 at 20:58

this basically shows the underlying geometry for estimating the depth.

As you said, we have camera pose Q, and we are picking a point X from world, X_L is it's projection on left image, now, with Q_L, Q_R and X_L, we are able to make up this green colored epipolar plane, the rest job is easy, we search through points on line (Q_L, X), this line exactly describe the depth of X_L, with different assumptions: X1, X2,..., we can get different projections on the right image

Now we compare the pixel intensity difference from X_L and the reprojected point on right image, just pick the smallest one and that corresponding depth is exactly what we want.

Pretty easy hey? Truth is it's way harder, image is never strictly convex:

This makes our matching extremely hard, since the non-convex function will result any distance function have multiple critical points (candidate matches), how do you decide which one is the correct one?

However, people proposed path based match to handle this problem, methods like: SAD, SSD, NCC, they are introduced to create the distance function as convex as possible, still, they are unable to handle large scale repeated texture problem and low texture problem.

To solve this, people start to search over a long range in the epipolar line, and suddenly found that we can describe this whole distribution of matching metrics into a distance along the depth.

The horizontal axis is depth, and the vertical axis is matching metric score, and this illustration lead us found the depth filter, and we usually describe this distribution with gaussian, aka, gaussian depth filter, and use this filter to discribe the uncertainty of depth, combined with the patch matching method, we can roughly get a proposal.

Now what, let's use some optimization tools, like GN or gradient descent to finally refine the depth estimaiton.

To sum up, the total process of the depth estimation is like the following steps:

assume all depth in all pixel following a initial gaussian distribution
start search through epipolar line and reproject points into target frame
triangulate depth and calculate the uncertainty of the depth from depth filter
run 2 and 3 again to get a new depth distribution and merge with previous one, if they converged then break, ortherwise start again from 2.

That's another interesting way,. In fact, I coded my [own solution](https://github.com/engyasin/3D-reconstruction_with_known_poses). I'd like to mention that I solved the problem by creating a dense features detection then raising the noise thresholds (causing only the non-noisy "certain" points to be triangulated) — Yasin Yousif, Oct 19 '19 at 09:05

How to 3d reconstruct robustly from multiple images with known poses in OpenCV

1 Answers1