4

How can I use SSE2 in GCC? I want to work with double values.

I search s.th. like this: http://vrm-vrm.blogspot.com/2009/10/gcc-intrinsics.html only for double values.

cl_progger
  • 413
  • 2
  • 6
  • 10
  • You could just compile with `-msse` etc., or with `-march=native`... – Kerrek SB Nov 14 '11 at 18:00
  • That's all? I heard it would be complicated and one has to use intrinsics. – cl_progger Nov 14 '11 at 18:01
  • 2
    You have to use intrinsics if you want to use explicit constructions. With the compiler flags, you just tell the compiler that it's OK to use the hardware when available and when the optimizer determines that it's a good choice. Use in conjunction with some `-O` level. There's no guarantee, but give it a try and compare the assembly. – Kerrek SB Nov 14 '11 at 18:03
  • If you're using a relatively new CPU (< 5 years old) then you may not see much benefit from SSE with double precision, since most modern x86 CPUs have two FPUs now and you only get 2 way SIMD with double precision on SSE. – Paul R Nov 14 '11 at 18:42

1 Answers1

5

If you want to use the SSE2 double insns, you have to compile with gcc -mfpmath=sse -msse2.

The option -msse2 alone will allow you to use SSE2 intrinsics, -mfpmath=sse will cause GCC to emit SSE2 insns for all FP operations.

Also note that vectorization is enabled at -O3.

The advantages of vectorized SSE2-4 insn are obvious, Sandy Bridge processors can execute up to three 256-bit operations per cycle (for example 4 double multiplies, 4 double additions and some shuffle on top of it)

However, Intel optimizations manual recommends using SSE even for scalar operations, for reasons including flat register model and shorter latencies, compared to legacy x87 insns.

EDIT:

Forgot to mention, for 32-bit code, you may also add -msseregparm, which will cause FP arguments and return values to be passed via SSE registers. By default they are passed on memory and in %st0, respectively. Naturally, this changes the ABI, so all interacting modules have to be compiled with this option.

chill
  • 16,470
  • 2
  • 40
  • 44