The projection of a 3D point (x,y,z)
to the 2D image coordinates (X,Y)
can be calculated as a vector-matrix multiplication in homogeneous coordinates:
[ a_00 a_01 a_02 a_03 ] [ x ] [ X W ]
[ a_10 a_11 a_12 a_13 ] * [ y ] = [ Y W ]
[ a_20 a_21 a_22 a_23 ] [ z ] [ Z W ]
[ a_30 a_31 a_32 a_33 ] [ 1 ] [ W ]
with
[ X W ] [ x a_00 + y a_01 + z a_02 + a_03 ]
[ Y W ] [ x a_10 + y a_11 + z a_12 + a_13 ]
[ Z W ] = [ x a_20 + y a_21 + z a_22 + a_23 ]
[ W ] [ x a_30 + y a_31 + z a_32 + a_33 ]
And the pixel coordinates (X,Y)
are obtained by dividing the first and second rows by the fourth row. This step is the conversion from homogeneous to cartesian coordinates.
The third row of the OpenGL projection matrix is set up in a way that Z
becomes the projected depth, which is such that z
values between n
and f
(near and far planes) are mapped to -1...1
. It is the used for depth test/clipping. Because the fourth row is [0 0 -1 0]
, the conversion from homogeneous to cartesian coordinates corresponds to a division by -z
, which results in the perspective transformation (with inverted depth).
Any other way of expressing the projection would involve the same steps, namely the linear transformation, followed by the division by Z for the perspective foreshortening. Matrices are the usual representation in linear algebra to for these operations.
This is not specific for perspective projections, but many 3D transformatios can be expressed using a 4x4 matrix, including rotations, translations, scalings, shearings, reflections, perspective projection, orthogonal projection, and others.
Multiple transformations that should be applied after one another can also be combined into a single 4x4 matrix by matrix multiplication. For example rotations around the X, Y and Z axis, or the MVP matrix. This is the model-view-projection matrix, which translates a 3D point in the local coordinate system of one object in the 3D scene, into its final pixel coordinate on the screen. On these combined matrices all components can be non-zero.
So the advantage is that a single operation, the vector-matrix multiplication is useable for all these cases, instead of several different operations. It is performed in an efficient way on GPU hardware.