-1

Does anyone know of a fix for an MSVC compiler bug/annoyance where SIMD Extension settings get "stuck" on AVX?

The context of this question is coding up SIMD CPU dispatchers, closely following Agner's well-known dispatch_example2.cpp project. I've been going back and forth in three different MSVC projects and have dead-ended with this issue in two of them, after which one of those two "fixed itself" somehow.

The question is pretty simple: To compile the dispatchers I need to compile 4 times with

/arch:AVX512 /DINSTRSET=10
/arch:AVX2 /DINSTRSET=8
/arch:AVX /DINSTRSET=7
/arch:SSE2 /D__SSE4_2__

While I'm doing this I'm watching the value of INSTRSET and this code:

#if defined ( __AVX512VL__ ) && defined ( __AVX512BW__ ) && defined ( __AVX512DQ__ )
#define AVX512_FLAG 1
#else
#define AVX512_FLAG 2
#endif

#if defined ( __AVX2__ )
#define AVX2_FLAG 1
#else
#define AVX2_FLAG 2
#endif

#if defined ( __AVX__ )
#define AVX_FLAG 1
#else
#define AVX_FLAG 2
#endif

The behavior is like this: For the three AVX compiles everything is exactly as expected. When the problem is not happening, the SSE2 compile shows as expected (AVX512_FLAG, AVX2_FLAG, AVX_FLAG == 2) and the final code runs fine.

When the problem is happening, for the /arch:SSE2 /D__SSE4_2__ compile the code above shows AVX512_FLAG == 2 but AVX2_FLAG == AVX_FLAG == 1 and INSTRSET == 8, and the compiler thinks the AVX2 instructions are enabled - the project compiles, but crashes on an SSE4.2 machine.

If I try /arch:SSE2 /DINSTRSET=6 then I get INSTRSET == 6 for the compile, but the code above still shows AVX2_FLAG == 1 and AVX_FLAG == 1, and the final project still crashes on an SSE4.2 machine.

The crashes happen even if I don't run any vector code - anything that calls into the dispatcher crashes immediately even if all vector code is short circuited.

FYI, trying /DINSTRSET=6 is just an act of desperation - I've never gotten anything to work with SSE4.2 without using /D__SSE4_2__

Does anyone know how to fix this problem that is completely halting my progress? Tried "Clean Solution" already.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
dts
  • 125
  • 1
  • 10

2 Answers2

2

If you want a single binary which works on SSE-only computers, but can leverage AVX when available, you need to do following.

  1. At the project level, set “Enable enhanced instruction set: Not set” if you’re building for Win64, or “SSE2” if you’re building for Win32.

  2. Set “Enable enhanced instruction set: AVX” or AVX2 only on the *.cpp files which contain AVX version of your functions.

  3. Make sure to never call these AVX functions unless both CPU and OS (see GetEnabledXStateFeature WinAPI) actually have the support.

Practically speaking, instead of compiling same source file multiple times with different settings, compile 4 different source files. They can contain the same code, C++ has #include preprocessor directive. If you have a single implementation dispatched with these macros, move that implementation into *.inl or *.hpp file, and include that file into 4 different *.cpp files for different CPUs.

Soonts
  • 20,079
  • 9
  • 57
  • 130
  • Thanks for the comment. As I said, I've been closely following Agner's dispatch_example2.cpp example for CPU dispatching (http://www.agner.org/optimize/#vectorclass), and it works fine when this issue is not happening. I have two projects open in front of me, one where this issue is NOT happening and the code compiles and performs as expected, and the other where this issue IS happening and the project compiles to junk. – dts Jan 06 '22 at 20:38
  • @dts There’s nothing wrong with VC++, that approach not gonna work with the toolset. For VC++, you need to do what I wrote in that answer: instead of compiling same `*.cpp` file 4 times with different settings, compile 4 different (and differently named) `*.cpp` files with different compiler settings each. – Soonts Jan 06 '22 at 20:52
  • Thanks for your comment. I'm not sure exactly what you are getting at, can you please clarify? Are you saying it works with GCC and Clang, but not MSVC? As I said, I have an example in front of me that is working fine. Compile 3 different object files for AVX512, AVX2 and AVX with the vector code, then compile the whole project with SSE4.2. Dispatcher is functioning as expected. – dts Jan 06 '22 at 21:07
  • 1
    @dts If you don’t need incremental builds, incremental linker, multiprocessor compilation, msbuild, and you’re willing to compile and link from command line, you can probably do that too. However, when people develop Windows software with VC++, they usually do want all these higher-level things, visual studio, msbuild, etc. As you found out, these tools aren’t happy with the hack used in that sample code. – Soonts Jan 06 '22 at 21:17
1

I figured this out (it's simple and boring). For the incremental object files I'm compiling 3 .obj files from the same .cpp (the .cpp with the vector code). When the MSVC SIMD settings are changed in the project level Properties, they may or may not get inherited in the .cpp file Properties. This is where the project gets "stuck" on AVX (sometimes, not always). Just need to check the .cpp file properties and make sure they are correct.

BTW I'm using VS 2019, /std:c++17 and the context above is the 32-bit build.

dts
  • 125
  • 1
  • 10