Compute Homography Matrix based on intrinsic and extrinsic camera parameters

Question

I am willing to perform a 360° Panorama stitching for 6 fish-eye cameras.

In order to find the relation among cameras I need to compute the Homography Matrix. The latter is usually computed by finding features in the images and matching them.

However, for my camera setup I already know:

The intrinsic camera matrix K, which I computed through camera calibration.
Extrinsic camera parameters R and t. The camera orientation is fixed and does not change at any point. The cameras are located on a circle of known diameter d, being each camera positioned with a shift of 60° degrees with respect to the circle.

Therefore, I think I could manually compute the Homography Matrix, which I am assuming would result in a more accurate approach than performing feature matching.

In the literature I found the following formula to compute the homography Matrix which relates image 2 to image 1:

H_2_1 = (K_2) * (R_2)^-1 * R_1 * K_1

This formula only takes into account a rotation angle among the cameras but not the translation vector that exists in my case.

How could I plug the translation t of each camera in the computation of H?

I have already tried to compute H without considering the translation, but as d>1 meter, the images are not accurate aligned in the panorama picture.

EDIT:

Based on Francesco's answer below, I got the following questions:

After calibrating the fisheye lenses, I got a matrix K with focal length f=620 for an image of size 1024 x 768. Is that considered to be a big or small focal length?
My cameras are located on a circle with a diameter of 1 meter. The explanation below makes it clear for me, that due to this "big" translation among the cameras, I have remarkable ghosting effects with objects that are relative close to them. Therefore, if the Homography model cannot fully represent the position of the cameras, is it possible to use another model like Fundamental/Essential Matrix for image stitching?

There is no "big" or "small" in absolute terms, it depends on how far the objects in the scene you want to look at are. 2 * atan(512/620) ~ 100deg, are you sure these lenses are fisheye? It is certainly possible to stitch with models other than a simple homography. You may want to look into the panotools software — Francesco Callari, Aug 05 '20 at 18:55
@FrancescoCallari my cameras have a similar view to the following picture (the one above) https://upload.wikimedia.org/wikipedia/commons/2/2c/Panotools5618.jpg . I got those focal length values from the K matrix calculated with the OpenCV's fisheye camera calibration sample code, are they not what one expects to get for a fisheye camera? I am developing a real-time stitcher and I am working with OpenCV mainly. Could you tell me which other models are there that could represent the translation? So I could do some research on them, thanks again! — makolele12, Aug 06 '20 at 06:34
I have a similar application. Did you get this to work with OpenCV ? — Cary H, Jan 17 '22 at 18:19
@CaryH yes I did, However I had to stick to the Homography matrix and live with the ghosting my system already had by design. Translation cannot be plugged there. To avoid ghosting try using other methods like seam finding and multiband blending (available in OpenCV). — makolele12, Jan 19 '22 at 23:13

Francesco Callari · Accepted Answer · 2022-11-27T13:23:40.863

You cannot "plug" the translation in: its presence along with a nontrivial rotation mathematically implies that the relationship between images is not a homography.

However, if the imaged scene is and appears "far enough" from the camera, i.e. if the translations between cameras are small compared to the distances of the scene objects from the cameras, and the cameras' focal lengths are small enough, then you may use the homography induced by a pure rotation as an approximation.

Your equation is wrong. The correct formula is obtained as follows:

Take a pixel in camera 1: p_1 = (x, y, 1) in homogeneous coordinates
Back project it into a ray in 3D space: P_1 = inv(K_1) * p_1
Decompose the ray in the coordinates of camera 2: P_2 = R_2_1 * P1
Project the ray into a pixel in camera 2: p_2 = K_2 * P_2
Put the equations together: p_2 = [K_2 * R_2_1 * inv(K_1)] * p_1

The product H = K2 * R_2_1 * inv(K1) is the homography induced by the pure rotation R_2_1. The rotation transforms points into frame 2 from frame 1. It is represented by a 3x3 matrix whose columns are the components of the x, y, z axes of frame 1 decomposed in frame 2. If your setup gives you the rotations of all the cameras with respect to a common frame 0, i.e. as R_i_0, then it is R_2_1 = R_2_0 * R_1_0.transposed.

Generally speaking, you should use the above homography as an initial estimation, to be refined by matching points and optimizing. This is because (a) the homography model itself is only an approximation (since it ignores the translation), and (b) the rotations given by the mechanical setup (even a calibrated one) are affected by errors. Using matched pixels to optimize the transformation will minimize the errors where it matters, on the image, rather than in an abstract rotation space.

Thanks a lot for the helpful answer, I edited my post adding 2 new questions based on your answer. — makolele12, Aug 05 '20 at 06:40
Is it not a valid homography, if we add translation to the camera extrinsics matrics, e.g., `[R|t]` where `R` is the rotation matrix and `t` is the translation vector? — chronosynclastic, Aug 30 '21 at 14:54
The views from two pinhole cameras roto-translated with respect to each other are not related by a homography. Apart from the math, you can convince yourself that this is the case by noticing that in this situation you can have scene objects with self-occlusions. For example, a box in the scene may have one face visible in one camera but not the other. This is not possible when the camera undergoes a pure rotation. — Francesco Callari, Aug 31 '21 at 15:58

Compute Homography Matrix based on intrinsic and extrinsic camera parameters

1 Answers1

Linked