1

I'm seeing some perplexing failures to optimize in VS2019 in some critical functions and I need to see if anyone can explain why this is happening before I submit a report to Microsoft.

What's happening is that all the XMM registers that would need to be preserved before use are being unconditionally preserved and restored even for a common short-circuit case where they're not used. It should be totally obvious to the optimizer that those registers aren't used for that path and they shouldn't be getting saved and restored for that branch, right?

To me this is an obvious pessimization, unconditionally performing unnecessary work and harming the performance of the likely branch. I can't comprehend it, so I have to wonder if I'm missing something.

Can anyone make sense of this decision for me?

template <typename TExt>
inline typename TExt::ps_t __vectorcall Blah::blah(somePOD_t type, typename TExt::ps_t vals) noexcept
{
    // WTF?? stores preserved registers xmm6, xmm7, xmm8, xmm9, xmm10, xmm11...
    // ...
    // vmovaps     xmmword ptr [rax-xxh],xmm6
    // vmovaps     xmmword ptr [rax-xxh],xmm7
    // ...
    // vmovaps     xmmword ptr [rax-xxh],xmm11
    // ...

    if (type == 0) // [[likely]] doesn't help
        return SomeClassT<TExt::EXT>::someFunc(vals);
        // ...then immediately restores them and returns.  But they're not used on this branch!
        // ...
        // vmovaps     xmm6,xmmword ptr [r11-xxh]
        // vmovaps     xmm7,xmmword ptr [r11-xxh]
        // ...
        // vmovaps     xmm11,xmmword ptr [r11-xxh]
        // ...
    else
        // A bunch of switch statements and SIMD intrinsics for the other type cases.
        // Registers would only need to be preserved for code here!
}

VS: 16.11.8

Command:

/permissive- /MP /ifcOutput "x64\Release\" /GS /W4 /wd"4100" /wd"4189" /wd"4324" /wd"4458" /wd"4710" /Gy /Zc:wchar_t /I"<many libs>" /Zi /Gm- /O2 /Ob2 /sdl- /Fd"x64\Release\vc142.pdb" /Zc:inline /fp:precise /Zp8 /D "_USRDLL" /D "WIN32" /D "_WINDOWS" /D "VST" /D "NDEBUG" /D "_CRT_SECURE_NO_WARNINGS" /D "_WINDLL" /errorReport:prompt /WX- /Zc:forScope /GR- /arch:SSE2 /Gd /Oi /MT /std:c++17 /FC /Fa"x64\Release\" /EHa /nologo /Fo"x64\Release\" /Ot /Fp"x64\Release\My Blah.pch" /diagnostics:column 
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ry-Fi
  • 81
  • 6
  • What is your version of CL.EXE, and what is your compiler command line? – Eljay Jan 04 '22 at 21:26
  • It is likely that you are right. I don't think anything more meaningful can be spotted here, without having Minimal Complete Reproducible Example. – Alex Guteniev Jan 05 '22 at 09:26
  • If you don't like MSVC's code-gen, use a better compiler, such as GCC or clang. They both know how to do "shrink wrap" optimization, at least in some cases, i.e. only save/restore registers after an early-out branch. – Peter Cordes Jan 05 '22 at 09:40

0 Answers0