I have a reference image A with a known position and I want to calculate the relative position of the camera at image B (i.e. tx, ty, tz in meters). The images are taken with the same camera so the camera matrix stays the same. I'm using SIFT to detect and compute the keypoints and descriptors in both images and match them with FLANN. From there I can get the homography matrix which I decompose with cv::decomposeHomography(..). This function is based on this paper: PDF. In this paper it is stated, that the translation matrix is normalized by d*, which is the plane depth.
In order to get the correct translation I need to know the plane depth. Is there a way to get this without knowing the size of an object found in the image?