Why is the reconstructed model a scaled version using SfM(Structure from motion)?

Question

 I am learning structure from motion by myself and have read many materials.

Even if I have intrinsic parameters of the camera, a metric reconstruction is obtained and produces a scaled model. From one material metric 3D reconstruction means that the distance between the two captures is unknown. Why can't I get the distance using point correspondences and intrinsic parameters? Can I get the model with physical measurement if I use more than two images?

Thanks in advance.

Regards Jogging

user3146587 · Answer 1 · 2014-01-25T12:59:33.773

If both the cameras and the scene are correspondingly scaled, the change will not be discernible in the captured images. That's why the scale factor is unknown in SfM. To obtain it, some physical measurement of the scene or the camera motion is typically needed.

If you're not convinced, simply do the math:

Let (P1 p1) and (P2 p2) be the 3x4 projection matrices of two cameras 1 and 2 (P1 is a 3x3 matrix and p1 a column vector), M a point in the scene, and m1 and m2 the respective projections of M in cameras 1 and 2. We have (~= means "is proportional to", because of the perspective division):

m1 ~= P1 M + p1
m2 ~= P2 M + p2

Introducing the camera centers C1 = -P1^-1 p1, C2 = -P2^-1 p2 and the translation T = C2 - C1 between the cameras, this can be written:

m1 ~= P1 (M - C1)
m2 ~= P2 (M - C2) = P2 (M - (C1 + T))

Now scale the whole scene by a factor of s and translate its origin it by o: M' = s M + o. Introduce two cameras 1' and 2', that are versions of 1 and 2 with the inverse scaling factor, i.e. P1' = 1/s P1 and P2' = 1/s P2. Scale and offset their centers C1' = s C1 + o and C2' = s (C1 + T) + o. The relative translation between the two cameras is now: C2' - C1' = s T. The projections of M' in 1' and 2' are:

m1' ~= P1' (M' - C1') = 1/s P1 (s M + o - s C1 - o) = P1 (M - C1)
    ~= m1
m2' ~= P2' (M' - C2') = 1/s P2 (s M + o - s (C1 + T) - o) = P2 (M - C2)
    ~= m2

So in the end, you get the same projections (your input in an SfM problem) with a scene that has a different scale and origin and correspondingly scaled and translated cameras. This can be generalized to more than two cameras.

Thanks. In SfM, the fundamental matrix or essential matrix is crucial. Does it mean that the fundamental matrix is not unique? In stereo imaging, two cameras will be calibrated simultaneously and maybe this can give more information. Can this produce the digital model without scale factor? — Jogging Song, Jan 23 '14 at 01:14
I have read the document of OpenCV again. The function stereoCalibrate can return the rotation and translation between two cameras, so one stereo camera can reconstruct the object without scale factor. — Jogging Song, Jan 23 '14 at 02:31
See the latest edit. I am not an OpenCV export, but `stereoCalibrate` should give you one translation that is consistent with the recovered cameras. You can still rescale the translation and cameras. — user3146587, Jan 25 '14 at 13:01

Francesco Callari · Answer 2 · 2014-01-25T05:43:47.493

If you only have images, and no other information about the physical sizes of objects in the scene, you cannot recover those sizes from images alone - at best you can only reconstruct the scene only up to an unknown scale factor. This means, for example, that you may be able to tell that two lines are perpendicular to each other. You may also be able to compute the width/height ratio of a rectangular tile, but without being able to tell what the individual values of the height and the width are.

You can convince yourself that this is in fact the case by noticing that, for example, the images remain unchanged if you shrink down all the objects in the scene by a factor S, and then move it closer to the camera by the same amount. This is what allows doing some old-school special effects in movies using miniature models (like these), and it works regardless of whether the camera is fixed or moving with respect to the scene - i.e. it applies to the multiple image case as well.

Why is the reconstructed model a scaled version using SfM(Structure from motion)?

2 Answers2