If both the cameras and the scene are correspondingly scaled, the change will not be discernible in the captured images. That's why the scale factor is unknown in SfM. To obtain it, some physical measurement of the scene or the camera motion is typically needed.
If you're not convinced, simply do the math:
Let (P1 p1)
and (P2 p2)
be the 3x4 projection matrices of two cameras 1 and 2 (P1
is a 3x3 matrix and p1
a column vector), M
a point in the scene, and m1
and m2
the respective projections of M
in cameras 1 and 2. We have (~=
means "is proportional to", because of the perspective division):
m1 ~= P1 M + p1
m2 ~= P2 M + p2
Introducing the camera centers C1 = -P1^-1 p1
, C2 = -P2^-1 p2
and the translation T = C2 - C1
between the cameras, this can be written:
m1 ~= P1 (M - C1)
m2 ~= P2 (M - C2) = P2 (M - (C1 + T))
Now scale the whole scene by a factor of s
and translate its origin it by o
: M' = s M + o
. Introduce two cameras 1' and 2', that are versions of 1 and 2 with the inverse scaling factor, i.e. P1' = 1/s P1
and P2' = 1/s P2
. Scale and offset their centers C1' = s C1 + o
and C2' = s (C1 + T) + o
. The relative translation between the two cameras is now: C2' - C1' = s T
. The projections of M'
in 1' and 2' are:
m1' ~= P1' (M' - C1') = 1/s P1 (s M + o - s C1 - o) = P1 (M - C1)
~= m1
m2' ~= P2' (M' - C2') = 1/s P2 (s M + o - s (C1 + T) - o) = P2 (M - C2)
~= m2
So in the end, you get the same projections (your input in an SfM problem) with a scene that has a different scale and origin and correspondingly scaled and translated cameras. This can be generalized to more than two cameras.