Need help understanding the Perspective-Three-Point

Question

I'm following this explanation on the P3P problem and have a few questions.

In the heading labeled Section 1 they project the image plane points onto a unit sphere. I'm not sure why they do this, is this to simulate a camera lens? I know in OpenCV, we first compute the intrinsics of the camera and factor it into solvePnP. Is this unit sphere serving a similar purpose?
Also in Section 1, where did $u^{'}_x$, $u^{'}_y$, and $u^{'}_z$ come from, and what are they? If we are projecting onto a 2D plane then why do we need the third component? I know the standard answer is "because homogenous coordinates" but I can't seem to find an explanation as to why we use them or what they really are.
Also in Section 1 what does "normalize using L2 norm" mean, and what did this step accomplish?

I'm hoping if I understand Section 1, I can understand the notation in the following sections.

Thanks!

Cross-posted: https://stackoverflow.com/q/47942857/781723, https://cs.stackexchange.com/q/85807/755. Please [do not post the same question on multiple sites](https://meta.stackexchange.com/q/64068). Each community should have an honest shot at answering without anybody's time being wasted. — D.W., Mar 01 '18 at 21:06

Leandro Caniglia · Answer 1 · 2017-12-23T14:23:51.143

Here are some hints

The projection onto the unit sphere has nothing to do with the camera lens. It is just a mathematical transformation intended to simplify the P3P equation system (whose solutions we are trying to compute).
$u'_x$ and $u'_y$ are the coordinates of $(u,v) - P$ (here $P=(c_x, c_y)$), normalized by the focal distances $f_x$ and $f_y$. The subtraction of the camera optical center $P$ is a translation of the origin to this point. The introduction of the $z$ coordinate $u'_z=1$ moves the 2D point $(u'_x, u'_y)$ to the 3D plane defined by the equation $z=1$ (the 3D plane parallel to the $xy$ plane). Note that by moving points to the plane $z=1$, you now can better visualize of them as the intersections of 3D lines that pass thru $P$ and them. In other words, these points become the projections onto a 2D plane of 3D points located somewhere on those lines (well, not merely "somewhere" but at the focal distance, which has now been "normalized" to 1 after dividing by $f_x$ and $f_y$). Again, all transformations intended to solve the equations.
The so called $L2$ norm is nothing but the usual distance that comes from the Pithagoras Theorem ($a^2 + b^2 = c^2$), only that it's being used to measure distances between points in the 3D space.

Thank you for the detailed response. A few follow ups: How does projecting onto a unit sphere simplify the system? What gives someone the intuition to do this? — Carpetfizz, Dec 23 '17 at 05:42
Good question. I haven't studied the equation in depth as to understand that. I've just tried to give you some clues so to help you move forward because I admire your courage. What I can say is that it is a well known technique to project onto a unit sphere when you have to deal with 3D angles because they can be characterized with points on the sphere. Also, the technique of moving 2D objects to the plane $z=1$ is well known from Projective Geometry, which simplifies equations by making them *homogeneous*. If you want to know more, I would recommend you to learn more about these topics. — Leandro Caniglia, Dec 23 '17 at 14:15
Great, thanks! I've started reading H&Z which has a section on projective geometry — Carpetfizz, Dec 23 '17 at 14:36

1 Answers1