HLSL multiplication with a vector and a matrix behaves strangely

Question

I've previously dealt with the issue of vector/matrix multiplication in HLSL behaving entirely different than expected, but I've transposed my matrices in my code to compensate, blissfully unaware of why this is necessary. But I really can't let this go.

The following summarizes my problem.

Create the projection matrix XMMatrixPerspectiveFovLH, which gives a matrix that is the transposed projection matrix -- at least that's how it appears in memory (I've printed it).

Put this matrix into a constant buffer and view it as a matrix type in HLSL. Then the product with this matrix and a column vector (column vector on right, see documentation) actually does the projection -- this seemingly contradicts that the matrix passed into the shader is transposed (that is, the result should've been correct if I had multiplied by a row vector).

In a fit of rage, I manually wrote the matrix into a float4x4 in HLSL:

float4x4 m = { 1.358,0,0,0,0,2.41421,0,0,0,0,1.001,1,0,-0.603553,-0.1001,0 };,

and I got what should've happened to my cbuffer matrix: a weird transform. Surely, if HLSL compiler did not generate some code to transpose my matrix, then there should be no difference in my results.

See here for what should've been an answer to my question, but I'm not sure of the accepted answer, namely this:

And it turns out that for some reason, in D3D9 HLSL, mul always expects matrices to be stored in column-major order. However, the D3DX math library stores matrices in row-major order, and as the documentation says, ID3DXBaseEffect::SetMatrix() expects its input in row-major order. It does a transpose behind the scenes to prepare the matrix for use with mul.

Does this mean that HLSL is auto transposing matrices? If so, does it do this to exactly those matrices passed into the shaders, and not to any matrices defined within the shader code itself? How can I know that this is true, for certain? And finally, if this is the case, why is this done at all? Why not just expect the matrices passed into the shader to be in the correct format initially? It seems to me like this is a small performance hit for no reason.

Edit: I've found a way to "fix" this. Using the row_major keyword forces mul to perform as expected using standard math convention. It seems that this keyword alters how the data is put into registers, so it stores each row in a register which presumably then performs a dot product with the vector to be transformed. If true, this reduces my question to "is it faster to store the values in registers consecutively by row, or "interleaved" by column?"; I'm interested to know how it would be faster by column.

You haven't specified your DirectX API version... are you using D3D9 or just accessing D3D9 documentation? I know early D3D11 era hardware took advantage of faster dot product instructions in HLSL based on whether matrices were row-packed or column-packed, but it may not be an issue on modern hardware. Generally speaking, DXMath (on C++ side) and HLSL work with vec-matrix pre-mult by default, and the only difference is that DXMath matrices are row-major, but are assumed to be column-major in HLSL (again, by default). That's why final matrices are transposed in C++ before being set in cbuffers. — Maico De Blasio, Jan 11 '23 at 03:34
Apologies. I'm using DX12; I was referencing that question under the assumption that not much had changed. Thanks for the comment! In that case, I think I'll just adopt to their row-major format, as it seems the developers found there to be some performance gain despite the implicit transpose. — user722227, Jan 11 '23 at 04:25

Chuck Walbourn · Answer 1 · 2023-01-12T04:29:37.137

This goes back to the ancient history of DirectX...

Firstly, DirectX has long adopted "row-major matrices, row vectors, pre-multiplication, and left-handed coordinates" as the preferred model. OpenGL traditionally used "column-major matrices, column rows, post-multiplication, and right-handed coordinates". For what that means, see this blog post.

The legacy D3DXMath library reflected this choice, although the modern DirectXMath library suppports both left-handed and right-handed view coordinate systems.

XNA Game Studio adopted "row-major matrices, row vectors, pre-multiplication, and right-handed coordinates" because it was considered a bit easier to understand "larger values are farther away" for depth.

The original fixed-function render pipeline also reflected this choice, but for the shift to programmable shader-based rendering this was not mandated. You can implement any combination you want as long as you are consistent.

The HLSL compiler defaults to column-major because in the early days of shaders, there were very few instruction slots so it was worth saving a single instruction. These days, the primarily value is that column-major can be done in a more parallel form:

Column-major:

// Mul vector4 * matrix4x4
    dp4 oPos.x, v0, c0
    dp4 oPos.y, v0, c1
    dp4 oPos.z, v0, c2
    dp4 oPos.w, v0, c3

Row-major

// Mul vector4 * matrix4x4
    mul r0, v0.y, c1
    mad r0, v0.x, c0, r0
    mad r0, v0.z, c2, r0
    mad oPos, v0.w, c3, r0

You'll see that the column-major version can do all four operations independently, but they have to be chained together in the row-major form.

Thanks for the reply! Would you mind explaining the column-vs-row matrix example? I can see that the column-major procedure is multiplying a (row)vector\*matrix, but I can't see that in the row-major procedure: if this sums the actions(products) v0.i*ci, then isn't this procedure for a matrix\*(column)vector? If so, the column-major procedure can be used, with rows replacing columns. — user722227, Jan 12 '23 at 01:47

HLSL multiplication with a vector and a matrix behaves strangely

1 Answers1