I've previously dealt with the issue of vector/matrix multiplication in HLSL behaving entirely different than expected, but I've transposed my matrices in my code to compensate, blissfully unaware of why this is necessary. But I really can't let this go.
The following summarizes my problem.
Create the projection matrix XMMatrixPerspectiveFovLH, which gives a matrix that is the transposed projection matrix -- at least that's how it appears in memory (I've printed it).
Put this matrix into a constant buffer and view it as a matrix
type in HLSL. Then the product with this matrix and a column vector (column vector on right, see documentation) actually does the projection -- this seemingly contradicts that the matrix passed into the shader is transposed (that is, the result should've been correct if I had multiplied by a row vector).
In a fit of rage, I manually wrote the matrix into a float4x4 in HLSL:
float4x4 m = { 1.358,0,0,0,0,2.41421,0,0,0,0,1.001,1,0,-0.603553,-0.1001,0 };
,
and I got what should've happened to my cbuffer matrix: a weird transform. Surely, if HLSL compiler did not generate some code to transpose my matrix
, then there should be no difference in my results.
See here for what should've been an answer to my question, but I'm not sure of the accepted answer, namely this:
And it turns out that for some reason, in D3D9 HLSL, mul always expects matrices to be stored in column-major order. However, the D3DX math library stores matrices in row-major order, and as the documentation says, ID3DXBaseEffect::SetMatrix() expects its input in row-major order. It does a transpose behind the scenes to prepare the matrix for use with mul.
Does this mean that HLSL is auto transposing matrices? If so, does it do this to exactly those matrices passed into the shaders, and not to any matrices defined within the shader code itself? How can I know that this is true, for certain? And finally, if this is the case, why is this done at all? Why not just expect the matrices passed into the shader to be in the correct format initially? It seems to me like this is a small performance hit for no reason.
Edit: I've found a way to "fix" this. Using the row_major
keyword forces mul
to perform as expected using standard math convention. It seems that this keyword alters how the data is put into registers, so it stores each row in a register which presumably then performs a dot product with the vector to be transformed. If true, this reduces my question to "is it faster to store the values in registers consecutively by row, or "interleaved" by column?"; I'm interested to know how it would be faster by column.