1

I wanted to create a matrix multiplication with simd. Everything is fine, when matrix is filled with some integers. But there are some issues when my matrices are filled with floating point values. The results are not quite correct. Here is my matrix representation:

union mat4
{
    struct
    {
        float E11, E12, E13, E14;
        float E21, E22, E23, E24;
        float E31, E32, E33, E34;
        float E41, E42, E43, E44;
    };
    struct
    {
        vec4 Line0;
        vec4 Line1;
        vec4 Line2;
        vec4 Line3;
    };
    vec4 Lines[4];
    float E[4][4];
    float V[16];
    __m128 I[4];
};

And my multiplication implementation:

inline mat4
operator*(const mat4& lhs, const mat4& rhs)
{
    mat4 res = {};

    __m128 v0 = {};
    __m128 v1 = {};
    __m128 v2 = {};
    __m128 v3 = {};

    for(int idx = 0; idx < 4; ++idx)
    {
        v0 = _mm_set1_ps(lhs.V[0+idx*4]);
        v1 = _mm_set1_ps(lhs.V[1+idx*4]);
        v2 = _mm_set1_ps(lhs.V[2+idx*4]);
        v3 = _mm_set1_ps(lhs.V[3+idx*4]);
        res.I[idx] = _mm_fmadd_ps(rhs.I[0], v0, res.I[idx]);
        res.I[idx] = _mm_fmadd_ps(rhs.I[1], v1, res.I[idx]);
        res.I[idx] = _mm_fmadd_ps(rhs.I[2], v2, res.I[idx]);
        res.I[idx] = _mm_fmadd_ps(rhs.I[3], v3, res.I[idx]);
    }

    return res;
}

I don’t think the issue is with the data alignment and everything is being filled up correctly. But the results are way off. I’ll be thankful for any help here.

Arheus
  • 21
  • 3
  • 1
    Please post a [mcve]. – n. m. could be an AI Aug 03 '23 at 15:09
  • This is already ready to use – Arheus Aug 03 '23 at 15:26
  • 4
    Not really, there are no includes, no main, no `vec4` definition, no mention of how it is compiled or to what target architecture, what test inputs were used, no expected results nor the ones actually produced. – Bob__ Aug 03 '23 at 15:50
  • @Bob__: [sse2] is an x86 extension. But you're right that it's not a [mcve] and we don't know which compiler. An actual MCVE would be something with code that can be copy/pasted into a local file and compiled without further editing. This is missing `#include ` among other things. Equally importantly, what kind of wrong? "The results are not quite correct" sounds like different rounding error that you'd expect from optimizing / vectorizing, but "way off" doesn't. – Peter Cordes Aug 03 '23 at 18:42
  • Use a library. Here’s a good one with MIT license: https://github.com/microsoft/DirectXMath/blob/dec2022/Inc/DirectXMathMatrix.inl#L227-L438 That multiplication function can use SSE1, AVX1, FMA3 or NEON, depending on the macros. – Soonts Aug 04 '23 at 10:55

1 Answers1

-1

You read from res.I before writing to it.

This is a problem because mat4 res = {}; does not make I the active member, rather the first member (unnamed struct with 16 float members) is active.

Reading from an inactive member of a union is undefined behavior unless the "common initial subsequence" rule is satisfied, which is not.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 1
    All the major compilers that support Intel intrinsics (`__m128`) define the behaviour of union type-punning in C++ to be like in C99. At least GCC does explicitly, and it's widely used in MSVC code. So I don't think this explains bugs, unless they're using some obscure compiler like Sun CC which I seem to recall doesn't support something that the mainstream compilers do, perhaps it was union type-punning. – Peter Cordes Aug 03 '23 at 18:44