Why use a Matrix for 3D Projection?

Question

After searching up the calculations for a Projection Matrix (atleast in OpenGL),

Why bother using a Matrix when we have so many empty values? I count 9 entries marked as 0, and only 7 containing useful data. Why not just use a similar 1D array, and just store the data in a list-like shape? Wouldn't this save memory and time creating functions which can manipulate matrices? I'm sure that this entire argument can be used in other topics as-well, which makes me think,

What is the specific reason for using Matrices in projecting 3D environments?

check this http://gamedev.stackexchange.com/questions/72044/why-do-we-use-4x4-matrices-to-transform-things-in-3d — concept3d, Apr 24 '16 at 12:50

datenwolf · Answer 1 · 2016-04-24T12:45:45.137

1

It's not just about the single values, it's also about the mathematical properties of a matrix. And the zeros are just as important as the nonzero values! The very layout of the values has meaning!

Specifically the first three columns of a homogenous transformation matrix (like a 3D projection matrix) form the base vectors of local coordinate space, the 4th column defines a translation (which in case of a perspective projection moves the base away from the singularity point at the origin).

So in 3D space you have 3 values per position: You have to translate these three values into 3 values on your screen (the third value translates to a value that's used for depth comparison) and the 4th value (of the position and the destination) is used for perspective distortion. So for each of the 4 values in the original position you must know, how much it contributes to each of the 4 values in the output. If it doesn't contribute (and that's just as important) this is 0. So you need 4 · 4 = 16 values in total. Hence a 4×4 matrix.

edited Apr 24 '16 at 12:45

answered Apr 24 '16 at 12:40

datenwolf

159,371
13
185
298

Could you please elaborate? – Apr 24 '16 at 12:42
I understand what you are getting at, but **why** use a Matrix over a simple 1D array consisting of 16 or less values? – Apr 24 '16 at 12:48
1

@frayment: Because you need the additional 0-es. And in matrix form it is easier to read. The computer doesn't care about the shape anyway, it's just 16 values to it anyway. But there are certain constraints on how these are aligned in memory and current GPUs are optimized for groups of values being arranged in bunches of 4. So without taking extra preparations 4 scalar floats will spread over as much memory as 16 floats for efficient access. The computer sees 16 values, and in memory they're actually linear (a flat array), but matrices are easier to read, so that's what high level APIs give. – datenwolf Apr 24 '16 at 12:55
@frayment: You keep saying "Matrix over a simple 1D array" as though a Matrix were some hugely complicated Chthulu-esque perversion. It's a *simple type*, requiring very little complexity. – Nicol Bolas Apr 24 '16 at 14:19
@NicolBolas Haha. I just seem to have this undesirable sense that housing a secondary array inside of another reaps all the processing power slowing down execution. – Apr 24 '16 at 14:26
@frayment: I think you may be misguided by bad code that implements matrices through two level indirection (i.e. allocating an array of pointers (rows) and for each row allocate that memory in turn). This is how newbies do it and its bad. When allocating memory for a matrix you just allocate a bunch of contiguous (linear address space) memory and do the rest via addressing. – datenwolf Apr 24 '16 at 14:58

tmlen · Accepted Answer · 2016-04-24T12:46:22.037

The projection of a 3D point (x,y,z) to the 2D image coordinates (X,Y) can be calculated as a vector-matrix multiplication in homogeneous coordinates:

[ a_00  a_01  a_02  a_03 ]   [ x ]    [ X W ]
[ a_10  a_11  a_12  a_13 ] * [ y ] =  [ Y W ]
[ a_20  a_21  a_22  a_23 ]   [ z ]    [ Z W ]
[ a_30  a_31  a_32  a_33 ]   [ 1 ]    [  W  ]

with

[ X W ]   [ x a_00 + y a_01 + z a_02 + a_03 ]
[ Y W ]   [ x a_10 + y a_11 + z a_12 + a_13 ]
[ Z W ] = [ x a_20 + y a_21 + z a_22 + a_23 ]
[  W  ]   [ x a_30 + y a_31 + z a_32 + a_33 ]

And the pixel coordinates (X,Y) are obtained by dividing the first and second rows by the fourth row. This step is the conversion from homogeneous to cartesian coordinates.

The third row of the OpenGL projection matrix is set up in a way that Z becomes the projected depth, which is such that z values between n and f (near and far planes) are mapped to -1...1. It is the used for depth test/clipping. Because the fourth row is [0 0 -1 0], the conversion from homogeneous to cartesian coordinates corresponds to a division by -z, which results in the perspective transformation (with inverted depth).

Any other way of expressing the projection would involve the same steps, namely the linear transformation, followed by the division by Z for the perspective foreshortening. Matrices are the usual representation in linear algebra to for these operations.

This is not specific for perspective projections, but many 3D transformatios can be expressed using a 4x4 matrix, including rotations, translations, scalings, shearings, reflections, perspective projection, orthogonal projection, and others.

Multiple transformations that should be applied after one another can also be combined into a single 4x4 matrix by matrix multiplication. For example rotations around the X, Y and Z axis, or the MVP matrix. This is the model-view-projection matrix, which translates a 3D point in the local coordinate system of one object in the 3D scene, into its final pixel coordinate on the screen. On these combined matrices all components can be non-zero.

So the advantage is that a single operation, the vector-matrix multiplication is useable for all these cases, instead of several different operations. It is performed in an efficient way on GPU hardware.

"It is performed in an efficient way on GPU hardware." - Would this be better than a single-dimensional approach? — , Apr 24 '16 at 12:49
No. GPUs work with linear algebra, not vector spaces that map matrices to vectors in a completely arbitrary way. You can turn your matrix into a 1D array and write functions to perform matrix operation on it, but eventually you need to talk with OpenGL which cannot understand your array so you must convert it back to a matrix. This is a waste of CPU time for something that can be possibly done on GPU, if the driver does it so, in a very efficient manner. — , Apr 24 '16 at 12:52
From the point of view of the hardware, there is no difference between a 4x4 matrix and a 16 element array. `m[i][j]` simply maps to `arr[4*i + j]`. The matrix in the programming language is internally represented as a 1D array. (like all objects). — tmlen, Apr 24 '16 at 12:58
@tmlen: "*From the point of view of the hardware, there is no difference between a 4x4 matrix and a 16 element array.*" Yes, there is. Well, for vectorized hardware there is. For more freeform SIMD hardware, you're correct. — Nicol Bolas, Apr 24 '16 at 14:21

score 0 · Answer 3 · answered Apr 24 '16 at 12:44

It's probably quite rare that the projection matrix would get used as-is. Typically, you're more likely to concatenate the projection matrix with the world and view matrices and multiply by the world-view-proj matrix all in one go.

Also, GPUs are powerful and flexible, but if there's one thing they're best at doing, it's a series of multiply-adds on vectors (although newer hardware is just as efficient with scalar multiply-adds as vector multiply-adds). Matrix-vector multiplies are just a series of vector multiply-adds, and a more compact structure might be less efficient.

That said, your point is not without merit, I am aware of one successful fixed-function based games console which had limited hardware registers for the projection matrix to take advantage of your exact point that most of the entries in the projection matrix are typically unused.

"concatenate the projection matrix with the world and view" - Yes. I'm currently programming one right now. :) So you're saying that GPU's are better at calculating Matrices than 1D arrays?? — , Apr 24 '16 at 12:45
@frayment: I think you are confusing some things here. Matrices - like any other 2D data structure - _are_ stored as 1D datastructures in the end in any case. And for standard matrices in the graphics context, a consecutive linear layout is the usual case. — derhass, Apr 24 '16 at 12:51

Why use a Matrix for 3D Projection?

3 Answers3