4

I have a calibrated (virtual) camera in Blender that views a roughly planar object. I make an image from a first camera pose P0 and move the camera to a new pose P1. So I have the 4x4 camera matrix for both views from which I can calculate the transformation between the cameras as given below. I also know the intrinsics matrix K. Using those, I want to map the points from the image for P0 to a new image seen from P1 (of course, I have the ground truth to compare because I can render in Blender after the camera has moved to P1). If I only rotate the camera between P0 and P1, I can calculate the homography perfectly. But if there is translation, the calculated homography matrix does not take that into account. The theory says, after calculating M10, the last row and column should be dropped for a planar scene. However, when I check M10, I see that the translation values are in the rightmost column, which I drop to get the 3x3 homography matrix H10. Then, if there is no rotation, H10 is equal to the identity matrix. What is going wrong here?

Edit: I know that the images are related by a homography because given the two images from P0 and P1, I can find a homography (by feature matching) that perfectly maps the image from P0 to the image from P1, even in presence of a translational camera movement.

enter image description here

chronosynclastic
  • 1,585
  • 3
  • 19
  • 40
  • all this math is often presented sloppily. that page won't suffice. don't just assume that you can drop 4x4 down to 3x3 regardless. there may be conditions. don't assume that the explanation is correct, or that you understood it correctly. look for the derivation/proof of that step. check it. -- I can't dive into this. this isn't trivial stuff. it takes time to understand. -- some years ago I tried to derive a homography for a camera view of a plane placed in space... that went badly. more recently, with fewer degrees of freedom, it went quite smoothly... *reduce the problem* – Christoph Rackwitz Nov 13 '21 at 00:03
  • The matrix H10 depends on the choice of the plane `n0 . p + c0 = 0` . The matrix `M10` encoding the rotation and translation between the two positions can be fixed, meaning the two positions 0 and 1 can be fixed relative to the world coordinate system, but if you change `n0 . p + c0 = 0` the matrix `H10` has to change too. So `H10` depends on `M10` and on `n0 . p + c0 = 0` . – Futurologist Nov 18 '21 at 02:44
  • So if we set `d0=0`as the book says, does it mean that there is no translation? It seems to me that only then it is possible to drop the last row and column of `M10` to get `H10`. – chronosynclastic Nov 18 '21 at 09:34
  • 1
    Definitely, in the general case you cannot be setting `d0=0`. In the case of pure rotation, when the camera center doesn't change, it seems to me that you can drop the d0 coordinate, as then the homography has purely 2D projective nature. Otherwise, when the cetner of the camera moves, the homography is not purly 2D projective, but it is a restriction of a 3D projective map. – Futurologist Nov 18 '21 at 14:57
  • 1
    I guess the question is: do you know the position of the plane `n0 . p + c0` in 3D? Because of you do not, then I do not think you can find H10 in the general case, when there is a translation of the center of the camera. Unless you have a bunch of matching points on the two images., which will allow you to construct `H10` and reconstruct the plane . – Futurologist Nov 18 '21 at 15:40
  • That's a very good point. Having read on the topic from another source, it became clear to me that the position of the plane, hence the distance between the plane and the camera must be known in order to compute `H10`, in the case of a camera translation. This distance can be computed as the dot product between the plane normal and a point on the plane. I will add an answer to describe the exact procedure. – chronosynclastic Nov 19 '21 at 16:04

1 Answers1

3

The theory became more clear to me after reading from two other books: "Multiple View Geometry" from Hartley and Zissermann (Example 13.2) and particularly "An Invitation to 3-D Vision: From Images to Geometric Models" (Section 5.3.1, Planar homography). Below is an outline, please check the above-mentioned sources for a thorough explanation.

Consider two images of points p on a 2D plane P in 3D space, the transformation between the two camera frames can be written as: X2 = R*X1 + T (1) where X1 and X2 are the coordinates of the world point p in camera frames 1 and 2, respectively, R the rotation and T the translation between the two camera frames. Denoting the unit normal vector of the plane P to the first camera frame as N and the distance from the plane P to the first camera as d, we can use the plane equation to write N.T*X1=d (.T means transpose), or equivalently (1/d)*N.T*X1=1 (2) for all X1 on the plane P. Substituting (2) into (1) gives X2 = R*X1+T*(1/d)*N.T*X1 = (R+(1/d)*T*N.T)*X1. Therefore, the planar homography matrix (3x3) can be extracted as H=R+(1/d)*T*N.T, that is X2 = H*X1. This is a linear transformation from X1 to X2.

The distance d can be computed as the dot product between the plane normal and a point on the plane. Then, the camera intrinsics matrix K should be used to calculate the projective homography G = K * R+(1/d)*T*N.T * inv(K). If you are using a software like Blender or Unity, you can set the camera intrinsics yourself and thus obtain K. For Blender, there a nice code snippet is given in this excellent answer.

OpenCV has some nice code example in this tutorial; see "Demo 3: Homography from the camera displacement".

chronosynclastic
  • 1,585
  • 3
  • 19
  • 40
  • I am having a similar problem with the one you said. I am following "Demo 3: Homography from the camera displacement" and tried to convert it to python, but when I introduce translation on camera 2, It seems to be not projecting it correctly. the homography euclidean is defined as ''' homography_euclidean = R_1to2 +d_inv1 * np.dot(np.array(t_1to2).reshape(3,-1), normal.T )''' Do you know what might be the problem? – Darwin Harianto May 10 '23 at 06:37