11

With the GCC compiler, the -ftree-vectorize option turns on auto-vectorization, and this flag is automatically set when using -O3. To what level does it vectorize? I.e., will I get SSE2, SSE4.2, AVX, or AVX2 instructions? I know of the existence of the mavx, mavx2 flags, etc., but I want to know what the compiler is doing without those specific flags to force a particular type of vectorization.

Z boson
  • 32,619
  • 11
  • 123
  • 226
R_Kapp
  • 2,818
  • 1
  • 18
  • 32
  • I'm assuming you're interested only in the x86 instruction set? Your question could apply to other architectures as well such as Neon with ARM. – Z boson Nov 10 '15 at 09:47
  • 2
    I think actually only `-ftree-slp-vectorize` is turned on by _O3_, not `-ftree-vectorize`. [doc](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options) – Romeo Valentin Jul 05 '20 at 17:54
  • 2
    @RomeoValentin Both `-ftree-loop-vectorize` and `-ftree-slp-vectorize` are turned on at `O2`, which seems to be consistent with definition of [`-ftree-vectorize`](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-ftree-vectorize) – malat Nov 04 '21 at 08:11

1 Answers1

4

All x86 64-bit processors have at least SSE2. The GCC compiler will default to SSE2 code in 64-bit mode unless you tell it to use other hardware options.

For 32-bit mode GCC may use x87 instructions which are not SIMD instructions so to enable vectorization make sure to enable at least SSE with -mfpmath=sse -msse2.

If you enable higher SIMD options then the compiler may (and in many cases will) use those new instructions when vectorizing.

I believe this is true as well with Clang. However, ICC and MSVC do things differently. ICC may create a CPU dispatcher to select the best hardware (or to veto AMD hardware). MSVC only has options for enabling AVX and AVX2 in 64-bit mode (SSE2 is assumed). There is no way to explicitly enable e.g. SSE4.1 with MSVC. Instead in some cases the auto-vectorizer will add code to check for SSE4.1 (but not AVX) and use those instructions. GCC will only use SSE4.1 if you tell it to e.g with -msse4.1 or something higher such as -mavx.

Z boson
  • 32,619
  • 11
  • 123
  • 226