In inverse Image Projection, do I need to add or subtract t vector to my camera coordinates after multiplying with R inverse?

Question

I am currently working on the transformation between object, camera and world coordinates in an inverse image projection task. I have the following information available:

image coordinates of an object in the form of (u,v,1)
Euler angles converted to a rotation matrix (R)
the translation vector (t) representing distances between the camera and the GNSS receiver
camera matrix (K), and GNSS position.

To convert the camera coordinates to world coordinates, I have followed the steps outlined in the literature. First, I computed the camera coordinates (Xc, Yc, Zc) by inverting the intrinsic matrix K:

λ * [Xc, Yc, Zc] = K^(-1) * [u, v, 1]

Next, I want to transform the camera coordinates (Xc, Yc, Zc) to world coordinates (X, Y, Z) by inverting the extrinsic matrix [R | t], is this equation correct?:

[X, Y, Z] = R^(-1) * ( [Xc, Yc, Zc] * t )

Here is where my confusion arises. In my specific case, the desired world coordinates correspond to the objects detected by the camera. Therefore, I believe that I should add the translation vector (t) to my camera coordinates as follows:

[X, Y, Z] = R^(-1) * ( [Xc, Yc, Zc] + t )

However, from my understanding of the documentation, it seems that t is typically subtracted instead:

[X, Y, Z] = R^(-1) * ( [Xc, Yc, Zc] - t )

I would appreciate clarification on whether my understanding is correct and whether I should add or subtract or even multiply the translation vector in my case.

It is important to note that the translation vector (t) I have is obtained by manually measuring the distances between the camera and the GNSS receiver. I have not got the t vector typically obtained through solvePnP. Do I need to use the solvePnP-generated t vector, or is the manually measured t vector sufficient for my purposes?

my professional opinion is that working with rvec and tvec is insane, and causes insanity. convert them to 4x4 pose matrices. then you can compose those and invert them if needed. if the result must be an "rtvec" pair, you can convert a 4x4 pose matrix into an rvec and a tvec. -- if you really wanna know the math... it would involve applying the rotation to `t` and subtracting it. I don't even bother remembering, nor would I ever want to derive that. — Christoph Rackwitz, May 17 '23 at 17:09
same goes for euler angles. first, gimbal lock. then, there's no standard order/semantics to them. all conceivable permutations and even some inconceivable ones are used by someone somewhere. — Christoph Rackwitz, May 17 '23 at 17:11
@ChristophRackwitz thanks for your suggestions, but my problem is that I have been given the Euler_angles as = {'azimuth': -64.9, 'elevation': 76.2, 'roll': 167.8} and from them I have derived my rotation matrix , these angles seems wiered as they represent the relationship of my camera installed on a moving vehicle relative to the ground in degrees, and they are so huge, I dont know why and how. And the t vector as I said was measured physically by measuring tape from the GNSS reciever to the Camera. So I am not sure what to do with this — Ellee, May 17 '23 at 17:29
angles are probably in degrees. roll sounds like the camera is upside down, elevation means it's pointing very much skyward, and azimuth would tell me it's pointing to the (left) side. building a rotation matrix from individual rotation matrices is very sensible. you can map an axis triad through that and observe how it moves if you change any of those angles. that shows you whether you've composed them in the right order. -- physically measured translations are fine. — Christoph Rackwitz, May 17 '23 at 19:11
as for the titular question... if the forward transform consists of rotation-then-translation (`v' = Mₜ * Mᵣ * v`), I'd recommend calculating in the reverse order, i.e. `v = Mᵣ⁻¹ * Mₜ⁻¹ * v'`, which is `inv(R) * (v-t)` (with multiplications being dot products of course, not elementwise). `R⁻¹ = Rᵗ`. some choose to rearrange the equation some more, into `v = Rᵗ v' - Rᵗ t`. I hope I didn't screw up the math this time. -- when working with transformations, give each frame some name. then make sure you always know what points are in what frame, and how transformations move from frame to frame — Christoph Rackwitz, May 17 '23 at 20:08
When you want to talk about "inverse transform", you should explain the definition of your "forward transform" explicitly at first. i.e. write as equation how the your `t` used in the "forward transform". (If you can do it, what to do is just solve the "inverse transform" on paper) — fana, May 18 '23 at 02:36

score 0 · Answer 1 · answered Jun 14 '23 at 21:04

The usual camera equation is given by

s(u v 1) = K(RX+t) where s is an arbitrary real number, (u,v) is pixel coordinate of your object and R,t are the extrinsic parameter of the camera and K is the intrinsic matrix.

Hence you cannot usually recover the 3D position only from the 2D position (its intuitive), you can only find the direction of the ray that hit the corresponding pixel. This is modeled here by the presence of the scalar s.

Hence if you want to recover the 3D pose, you have to first invert for X :

X = R^(-1)*[s*K^(-1)*(u v 1) - t]
X = sR^(-1)*K^(-1)*(u v 1) - R^(-1)*t

Now you have to use your "distances between the camera and the GNSS receiver", which I will denote by the letter d. Do NOT confuse it with the t used above, they dont mean the same thing.

The distance between the camera center and the object is hence d, but it is also equal to the norm of the difference between the camera center and the position X of the object :

d = ||sR^(-1)*K^(-1)*(u v 1)||

because - R^(-1)*t is equal to the camera center.

Hence you can find the value of s (or more precisely, the absolute value value of s because s could also be negative but then the object would be behind the camera which is probably not what you are looking for) :

s = d/||R^(-1)*K^(-1)*(u v 1)||

And now you can just substitute in the above equation to find the value of X.

X = d * [R^(-1)*K^(-1)*(u v 1)]/||R^(-1)*K^(-1)*(u v 1)|| - R^(-1)*t

Feel free to check my computations, I wrote this very quickly.

In inverse Image Projection, do I need to add or subtract t vector to my camera coordinates after multiplying with R inverse?

1 Answers1