Understanding the solvePnP Algorithm

Question

I'm having trouble understanding the Perspective-n-Point problem. A few questions:

What is s for? Why do we need a scale factor for the image point?
Is K[R|T] a "change of coordinates matrix" which moves p_w, the homogenous world point, into the coordinate space of the 2D image plane?
I understand that [R|T] represents the "rotation and translation" of the camera relative to the corresponding world point p_w and that is what we are trying to solve for. What's particularly difficult about this? Can't we just say [R|T] =inv(K)s(p_c)inv(p_w)? I just did this with some basic matrix algebra.
I don't understand why PnP has multiple solutions... what are these multiple solutions exactly?

Thanks for any help!

Sorry, but they are all very closely related and I would rather not make 4 separate posts on pretty much the same thing — Carpetfizz, Sep 28 '17 at 01:46

Kamil Szelag · Accepted Answer · 2017-09-28T07:35:34.350

4

Scale factor is needed to determine if there is little object viewed from small distance or big object viewed from higher distance

In typical camera pinhole equation

s represents Z coordinate of point in camera coordinate system

Right, K[R|t] is projection matrix, which maps 3d coordinates in some object/world/global coordinate system into image 2d coordinates as in equation above.
It is not so easy, because you often don't know point cooridnates in camera coordinate syetem, but know 2d coordinates in image coordinate system. Transformation between camera coordinates system and image coordinate system looses one dimension, and there is also scale factor which makes our equation not-exactly linear. That's why it is not so easy to compute.
Different algorithms uses different approaches to add additional information needed for solution. For example DLT (direct linear transform) method uses features of projection matrix. Beside analytic solutions there are also many methods which use nonlinear optimization - for example Levenberg-Marquardt used in openCV.

edited Sep 28 '17 at 07:35

answered Sep 28 '17 at 07:23

Kamil Szelag

Thank you so much for your answer! For point 3, do you mind specifying which variable is the point in the camera coordinate system which I’m assuming knowledge of? – Carpetfizz Sep 28 '17 at 14:06
Camera coordinate system is system which center is located in camera focus, x,y are parallel to image plane and z is normal to that plane. Point in camera coordinate system has coordinates scaled in same units as in global coordinate system. During calibration usually (beside special cases) you are not measuring point position relative to camera focus. – Kamil Szelag Sep 28 '17 at 16:31
Okay, thanks again. I guess I'm still a little confused on where in `[R|T] =inv(K)s(p_c)inv(p_w)` I'm assuming knowledge of a point in the **camera coordinate system**. What is the high level objective of the algorithms you described in point 4? – Carpetfizz Sep 28 '17 at 16:52

1 Answers1