I am planning to implement runtime detection of SIMD extensions. Is it such that if I find out that the processor has AVX2 support, it is also guaranteed to have SSE4.2 and AVX support?
3 Answers
Support for a more-recent Intel SIMD ISA extension implies support for previous SIMD ones.
AVX2 definitely implies AVX1.
I think AVX1 implies all of SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2 feature bits must also be set in CPUID. If not formally guaranteed, many things make this assumption and a CPU that violated it would probably not be commercially viable for general use.
Note that popcnt
has its own feature bit, so in theory, you could have a CPU with AVX2 and SSE4.2, but not popcnt
, but many things treat SSE4.2 as implying popcnt
. So it's more like you can advertise support for popcnt
without SSE4.2.
In theory, you could make a CPU (or virtual machine) with AVX but which didn't accept the non-VEX legacy-SSE encoding of SSE4.2 instructions like pcmpistri
, but I think you'd be violating Intel's guarantees about what the AVX feature bit implies. Not sure if that's formally written down in a manual, but most software will assume that.
But AVX1 does imply support for the VEX encoding of all SSE4.2 and earlier SIMD instructions, e.g. vpcmpistri
or vminss
gcc -mavx2
definitely implies AVX1 and previous extensions, but will only emit code that uses the VEX encoding. It will define the __SSE4_2__
macro and so on, though, so GCC does treat AVX2 as implying earlier SSE extensions and popcnt
, but not FMA, AES-NI, or PCLMUL. Those are separate features even for GCC.
(In practice you should use gcc -march=native
or gcc -march=znver1
or whatever to enable all the features your CPU has, and set tuning options for it. Not just -mavx2 -mfma
, that leaves tuning settings at bad defaults like splitting every possibly-unaligned 256-bit load/store into 128-bit halves.)
(Note that MSVC doesn't have as many SIMD ISA detection macros; it has one for AVX but not for all of the earlier SSE* extensions. MSVC's model is designed around the assumption that programs will do runtime CPU detection instead of being compiled for the local machine. Although MSVC does now have AVX and AVX2 options to use those as baselines.)
Note that AVX512 kind of breaks the traditions. AVX512F implies support for AVX2 and everything before it, but beyond that AVX512DQ doesn't come "before" or "after" AVX512ER, for example. You can (in theory) have either, both, or neither. (In practice, Skylake-X/Cannonlake/etc. has only a bit of overlap with Xeon Phi (Knight's Landing / Knight's Mill), beyond AVX512F. https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512

- 10,486
- 9
- 18
- 34

- 328,167
- 45
- 605
- 847
-
1Confirming your point about popcnt having its own feature bit: I peeked into MSVC's library implementation of `std::popcount`. If AVX is defined, it assumes the popcnt intrinsic is available. If AVX is not defined, it does a runtime check of the architecture for `__ISA_AVAILABLE_SSE442`. In other words, if you target AVX, the implementation assumes that implies SSE4.2, including popcnt. – Adrian McCarthy Feb 14 '21 at 19:41
If we set compiler option -mavx2
that GCC doesn't give an error when we use AVX or SSE intrinsics. So GCC supposes that existing of AVX2 flag is enough to run AVX and SSE code. Of course it does not garante that someone won't create CPU with AVX2 and without SSE.

- 3,980
- 1
- 27
- 40
-
I guess the question can be rephrased asking if somebody has already made a CPU (commercial) with AVX2 and without SSE. – rubund Nov 23 '18 at 12:59
-
1@Ruben I think that creation of such CPU has no sense. When I create code with AVX2 (I have been doing this for over 5 years) I of course also use AVX and SSE code and I don't have any troubles. – ErmIg Nov 23 '18 at 13:07
-
Such a CPU would not be commercially viable because it couldn't run real-world existing code. Except possibly in a Xeon-Phi type of device where it's only ever expected to run code compiled specifically for it. (But if so, why would you ever pick x86 instead of a cleaner and cheaper-to-license ISA like AArch64 or RISC-V, unless you're Intel... But anyway, actual Xeon Phi devices *do* support legacy-SSE up to 4.2, and AVX1/2, and MMX / x87 so you can run existing binaries. The point of my example was an even more special-purpose device.) – Peter Cordes Feb 14 '21 at 23:37
In principle, a CPU could just support AVX2 without supporting any SSE4 instructions (Which isn't as stupid an idea as it sounds!). In practice though, if it supports AVX2, it also supports SSE4.

- 9,331
- 9
- 12
-
I'm pretty sure this is not true. AVX2 implies AVX, and AVX implies that the VEX encoding of SSE4.2 instructions like `vpcmpistri` are available. I *think* it also implies that the non-VEX encoding is available, too. In theory you could make a CPU which didn't accept the non-VEX encoding, but I think you'd be violating Intel's guarantees about what the AVX feature bit implies. Not sure if that's formally written down in a manual, though. – Peter Cordes Nov 28 '18 at 03:58