0

I have a reference image A with a known position and I want to calculate the relative position of the camera at image B (i.e. tx, ty, tz in meters). The images are taken with the same camera so the camera matrix stays the same. I'm using SIFT to detect and compute the keypoints and descriptors in both images and match them with FLANN. From there I can get the homography matrix which I decompose with cv::decomposeHomography(..). This function is based on this paper: PDF. In this paper it is stated, that the translation matrix is normalized by d*, which is the plane depth.

In order to get the correct translation I need to know the plane depth. Is there a way to get this without knowing the size of an object found in the image?

Oerdy
  • 19
  • 3
  • can you explain the difference of "euclidean homography" and "projective homography"? Which of them did you compute and which of them is asssumed in `cv::decomposeHomography(..)`? – Micka Jun 16 '16 at 16:51
  • Well the projective homography G is the homography in the image space and the euclidean homography H is the homography in euclidean space (i.e. world coordinates). Those can be calculated by H = CamMatrix^1 * G * CamMatrix. I get from cv::findHomography G the projective homography. I assume that the cv::decomposeHomography(...) needs the projective homography because it needs the CamMatrix. Am I correct on this? – Oerdy Jun 16 '16 at 22:02
  • Is the output of the translation matrix from the decomposeHomography homogenous? – Oerdy Jun 18 '16 at 17:38

1 Answers1

0

The 3D translation computed using homography decomposition is only computable up to an unknown scale factor. This is a classical problem with computing 3D geometry from monocular images using only apparent motion in the images. Typically 3D reconstructions from monocular images are called metric reconstructions for this reason (rather than Euclidean reconstructions where scale is resolved). To resolve the scale factor some more information is needed, such as knowing the depth of a point on the plane or the distance moved by the camera between images.

Toby Collins
  • 823
  • 5
  • 8