How to compute the true depth given a disparity map of rectified images?

Question

I have calculated a disparity map for a given rectified stereopair! I can calculate my depth using the formula

z = (baseline * focal) / (disparity * p)

Let's assume that the baseline, focal length and pixel constant p are known and I used the same camera for both images. Now it is possible that my disparity is in the range of -32..128[pixel]. When I use the above formula I will get infinity/divided by zero for my values of 0 disparity. When i move my disparity values to lets say 1..161 I have chosen the range of my disparity values arbitrary and that's a problem because the function 1/disparity will give a completly different value spacing at 1..161 or 100..260 that isn't even linear. So I wouldn't even get a reconstruction up to (linear)scale because the scale change is non-linear.

How can i determine in what area my disparity has to lie to get a metric reconstruction with the above formula? Or is it simply not possible to reconstruct something metrically with the above formula and rectified images? And if that's the case, why?

(I know I can reproject to my non-rectified images and do a triangulation but I want to know especially WHY or IF it is not possible with the above formula. Thanks to anyone who can help me!)

Is your rig metrically calibrated, or are you doing just a projective reconstruction? — Francesco Callari, May 14 '18 at 03:42
Thanks for your answer! In my special scenario, I have no calibration matrix K, but I have the pixel constant p. But even if I had the calibration matrix K, the above formula wouldn't determine a metric reconstruction, wouldn't it ? ( only possible thing that I can imagine is if I had a single z value for a given disparity, I could shift my disparity values to the corrsponding value range ) — Miau, May 16 '18 at 15:00
[ note: for me it is important to know if metric depth can be determined with this formula or not ] — Miau, May 16 '18 at 15:12
You wrote in another post that reconstructions from rectified images using the above formula are projective. "The parallel-camera formula gives you a depth at a given pixel with respect to an ideal camera that observes the rectified image". - But haven't I just transformed my images rigidly so that it is still metric? What is the exact difference between an rectified image pair and an ideal stereo image pair? And isn't there a possibility to get negative disparities in an ideal stereo pair setup (what also leads to the problem stated above)? Thanks in advance if you have time to clarify this! — Miau, May 16 '18 at 18:19
I am facing a similar issue. How did you get the value of P? — Shashank Dhar, Nov 24 '19 at 07:06

score 6 · Answer 1 · answered May 16 '18 at 18:46

6

The problem is that the rectification in general will scale and rotate your images, so you can't just forward project depth from the rectified left camera and get a metrical reconstruction. Rather, you need to undo the rectification on the correspondences. You do that by computing a projective matrix Q that maps the disparity to 3D. See the documentation for stereoRectify and reprojectImageTo3D in the OpenCV docs.

For a few points, or to understand what's going on, you can proceed step by step. In recipe form:

For every pixel (x, y) in the rectified left image, look up the disparity for the corresponding pixel in rectified right: (x', y') = (x + d, y)
Apply to (x, y) and (x', y') the inverse of the rectification homographies H and H', obtaining (u, v) and (u', v') in the original image coordinates.
Backproject these pixels and intersect the rays.

answered May 16 '18 at 18:46

Francesco Callari

11,300
2
25
40

Thank you for the clarification. I completly understand your outlined method. And I know that this is the "true (and maybe only) way" to get a metric reconstruction. Nevertheless a part of my question is still unanswered: Could you maybe verify if my statement is correct? The pictures of a rectified stereo pair doesn't capture the scene like an ideal stereo pair would. Although same world points have the same y-pixel coordinate, the given image information do not really fit to the transformed (rectified) camera setup and therefore the computed 3D coordinates differ to the metric ones. – Miau May 17 '18 at 09:34
That means it is only possible to compute metric depth with the above formula when u have an ideal camera setup where you don't need a rectification and the cameras are placed in the way that you only have a positive disparity starting from the left image/camera. When your disparity contains negative values even with an ideal setup the scale of your depth will be calculated wrongly. – Miau May 17 '18 at 09:41
Afraid little of what you wrote makes sense to me. A perfectly parallel stereo rig hardly exists, especially at the resolution of modern sensors. To convince yourself that this is the case, compute, for an ordinary DSLR sensor and lens, the accuracy required to register within a row of pixels two cameras (or one camera translated) with a baseline of, say, 1/2 meter. Very hard to do outside of a laboratory bench. – Francesco Callari May 18 '18 at 04:56

Miau · Accepted Answer · 2018-06-04T10:40:15.463

I did some more research and think I can now answer my question. I think in the comments we talked a bit past each other. Maybe it now gets clearer what i exactly meant.

Parallel Setup: The formula z = (baseline * focal) / (disparity * p) can only be used if the images are captured by a parallel camera setup. If the cameras are truly parallel, it is not possible to have negative AND positive disparities. So you won't get a disparity value of 0. In that scenario, 0 only corresponds to a point at infinity. If a true parallel setup is present, this formula can be used for a metric reconstruction.

Converged Setup: In reality your images are mostly captured by a converged camera setup. That means in the stereo-pair images a point of convergence exists, that has a disparity value of 0. The sign of the disparites in front of that point and behind that point will be different. That means your disparity contains values that are negative, positive and equal to zero in the point of convergence. Although your images are rectified, you cannot use the above formula because the images were captured by a converged stereo camera setup. It is not possible to shift your disparity to "only positive signed values" to use the formula correctly. However, the result using shifted values will be "some kind of similar" to the correct 3-D-reconstruction but strangely scaled and distorted by an unknown transformation.

"strangely scaled and distorted" does not answer the question. — decadenza, Jul 04 '20 at 09:44
Well, read the sentence till the end. "..by an unknown transformation." You cannot quantify the distortion of the result trivially because it depends on the particular images. In addition, the question did not ask for this aspect. — Miau, Jun 09 '21 at 13:20

score 2 · Answer 3 · answered Jan 12 '20 at 03:11

2

You may look at this graph to find their relatinship:

refer to http://web.stanford.edu/class/cs231a/lectures/lecture6_stereo_systems.pdf

answered Jan 12 '20 at 03:11

fandulu

71
1
1

Tautaltao · Answer 4 · 2023-07-30T17:25:02.670

I looked up some papers about binocular vision, and I learned that the geometry is different when the two cameras are convergent. But with some information known, for example, the focal length and the convergent angles of two cameras, and the baseline. It is still possible to calculate the depths of the object.

Here I try to show how I see this problem:

enter image description here

The points Q and R are two cameras with convergent angles. As pointed out by the poster Miau, there will be a point that has 0 disparity which is the intersecting point of the two camera-axises, F, (some call it the fixation point). Suppose the angle at F is θ, and the two bottom angles are (180- θ)/2. The isosceles triangular △FQR is able to be formed.

(In fact, the 0-disparity points will form a circle, horopter, passing points F, Q, and R. )

Consider the two pin-hole cameras, the focal lengths are f. Point F should locate at the same place on two images (F'=F"). And find the disparities relate to the fixation point F.

Now, consider an arbitrary point P that we want to find its depth to the baseline. Assume P is inside the horopter, then the locations of P on the two images are different (P'=/=P"). We can call this difference the disparity relates to F: d1 = F'P', and d2 = F"P". d1 and d2 can be measured with pixels. The angles of disparities: ∠a = ∠P'QF'= ∠PQF; and ∠b = ∠P'RF'= ∠PRF. These angles can be calculated by focal length, f, and disparities: d1 and d2 by ∠a = arctan(d1/f) or ∠b = arctan(d2/f).

Then the two bottom angles of arbitrary triangular △PQR can be calculated: ∠c = (180-θ)/2-∠a, and ∠d =(180-θ)/2+∠b. The baseline 'I' is known. The depth of arbitrary point P is the height of the triangular △PQR.

I think the key is to find the location of the points F and P on two camera images F' and F" and the disparities P' and P".

As Miau mentioned points that fall outside the horopter will have negative disparities relative to the ones inside the horopter. But I think the geometry will still be similar.

Some information can be found in the following papers:

https://arxiv.org/abs/2012.06363

https://www.annualreviews.org/doi/pdf/10.1146/annurev-vision-091718-014942

I would welcome more suggestions and discussion.

How to compute the true depth given a disparity map of rectified images?

4 Answers4