Why VS C/C++ AVX512 compiled software work on my system while my CPU has no AVX512?

Question

I have seen recently that Visual Studio 2019 Preview has added an option to compile with AVX512. OK, I tried it and it worked. But why does it work while my CPU has no such capability?

I am using the following C/C++ script to detect the CPU capabilities: https://learn.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=vs-2019

All AVX512 flags (AVX512F, AVX512CD, AVX512PF and AVX512ER) are unavailable on my system when running this script.

Visual Studio 2019 Preview has the following options [AVX, AVX2, AVX512, SSE and SSE2]. AVX, AVX2, SSE and SSE2 compiled software work on my PC and that script listed above says that my PC supports all these four (AVX, AVX2, SSE and SSE2).

As you can understand now, the only problem seems to be the AVX512 capability. It works on my PC but every script I run says that I have no AVX512.

Thanks!

did you actually use any avx512 intrinsics or could it be that your program just didn't have any parts that the compiler would use AVX512 for? — PeterT, Nov 06 '19 at 14:47
Check the outputted assembly by turning on assembly output. AFAIK the current compiler version as of this comment does not emit AVX-512 unless you use intrinsics and doesn't do so correctly even then. They are aware [of that issue](https://developercommunity.visualstudio.com/content/problem/748361/incorrect-code-generation-with-visual-studio-2019.html) — Mgetz, Nov 06 '19 at 14:49
[This defect may also be relevant](https://developercommunity.visualstudio.com/content/problem/787296/vs2019-163-seems-to-incorrectly-detect-avx512-on-w.html) apparently the compiler may misdetect the ISA on some operating systems — Mgetz, Nov 06 '19 at 14:55

score 4 · Answer 1 · answered Nov 06 '19 at 14:51

4

Presumably the compiler chose not to actually use any AVX512 instructions when auto-vectorizing. Or only in functions that don't get called in your test-cases.

Enabling AVX512 means the compiler can choose to use AVX512 instructions, not that it definitely will. If it doesn't, then it doesn't have any instructions that will fault on CPUs without AVX512.

I don't know what MSVC's default tuning options are, but using 512-bit vectors isn't always profitable, especially for programs that spend most of their time in scalar code. (Running a 512-bit uop reduces max turbo for the next few milliseconds on current Skylake-X CPUs that do support AVX512.)

For 256-bit vectors, sometimes it's useful to use an AVX512VL instruction (EVEX encoding) like combining multiple boolean ops with vpternlogd, or one of the new shuffles like vpermt2d. Or an EVEX encoding of an instruction available in AVX2 or earlier just to use more registers (ymm16..31) or for masked operations.

Or maybe none of your loops auto-vectorized, or maybe you didn't use an optimization level high enough to even try to auto-vectorize.

answered Nov 06 '19 at 14:51

Peter Cordes

328,167
45
605
847

2

AFAIK the current version of the compiler as of this answer is not designed to use AVX512 except via intrinsics – Mgetz Nov 06 '19 at 14:52
@Mgetz: oh really? MSVC didn't even support AVX512 intrinsics until now? Wow that's far behind. I wonder if/when they'll just drop their own compiler (or at least the back-end) and switch to clang. – Peter Cordes Nov 06 '19 at 15:08
@Mgetz: That would explain it, but why would an AVX512 option even exist? MSVC and ICC don't require you to enable extensions before using intrinsics for them (unlike gcc/clang). There's no penalty for mixing VEX and EVEX, and in fact you *should* use the shorter VEX encoding whenever possible for 256-bit and 128-bit instructions. I guess this could enable register allocation to include `ymm16..31` when doing code-gen for functions that don't use AVX512-specific intrinsics; that's a useful feature that you would need an option for because there's no separate intrinsic. – Peter Cordes Nov 06 '19 at 15:13
2

@PeterCordes Yes, the flag enables the other 16 registers. – Mysticial Nov 06 '19 at 16:52

score 0 · Answer 2 · answered Jul 30 '21 at 01:51

0

MSVC's compiler is a multi-versioning auto-vectorizer. As in when you specify AVX-512 code generation it will also generate AVX2, AVX, SSE, MMX, and pure scaler fallback code and a it will add a run-time check for the highest instruction set available.

See the Auto-Vectorizer Section: https://learn.microsoft.com/en-us/cpp/parallel/auto-parallelization-and-auto-vectorization?view=msvc-160

Please note that this does not happen for intrinsic functions such as:


_mm256_add_ps(float*, float*); //AVX2 floating point add

answered Jul 30 '21 at 01:51

dave_thenerd

448
3
10

Comments on the other answer indicate that MSVC *will* sometimes use x/mm16..31 when AVX-512 is enabled, even if you don't use any `_mm512` intrinsics. – Peter Cordes Jul 30 '21 at 02:41
1

Also, no, MSVC doesn't auto-version when you just use `-arch:AVX2`. e.g. https://godbolt.org/z/qvTvxbGjY shows an array-sum function where x64 MSVC 19.14 unconditionally uses AVX2 instructions, with no checking or fallback. – Peter Cordes Jul 30 '21 at 02:41

Why VS C/C++ AVX512 compiled software work on my system while my CPU has no AVX512?

2 Answers2