0

I have to multiply a vector of integers with an other vector of integers, and then add the result (so a vector of integers) with a vector of floating points values.

Should I use MMX or SSE4 for integers, or can I just use SSE with all these values (even if there is integer ?) putting integers in __m128 registers ?

Indeed, I am often using integers in __m128 registers, and I don't know if I am wasting time (implicit casting values) or if it's the same thing.

I am compiling with -O3 option.

Paul R
  • 208,748
  • 37
  • 389
  • 560
LuapaJ
  • 51
  • 5
  • 1
    Show us what you want to do with some code. Show the scalar code of what you want to do as Paul R suggested. – Z boson Apr 22 '15 at 07:58

2 Answers2

1

You should probably just use SSE for everything (MMX is just a very out-dated precursor to SSE). If you're going to be targetting mainly newer CPUs then you might even consider AVX/AVX2.

Start by implementing everything cleanly and robustly in scalar code, then benchmark it. It's possible that a scalar implementation will be fast enough, and you won't need to do anything else. Furthermore, gcc and other compilers (e.g. clang, ICC, even Visual Studio) are getting reasonably good at auto-vectorization, so you may get SIMD-vectorized code "for free" that meets your performance needs. However if you still need better performance at this point then you can start to convert your scalar code to SSE. Keep the original scalar implementation for validation and benchmarking purposes though - it's very easy to introduce bugs when optimising code, and it's useful to know how much faster your optimised code is than the baseline code (you're probably looking for somewhere between 2x and 4x faster for SSE versus scalar code).

Paul R
  • 208,748
  • 37
  • 389
  • 560
0

While previous answer is reasonable, there is one significant difference - data organization. For direct SSE use data better be organized as Structure-of-Arrays (SoA). Typically, you scalar code might have data made around Array-of-Structures (AoS) layout. If it is the case, conversion from scalar to vectorized form would be difficult

More reading https://software.intel.com/en-us/articles/creating-a-particle-system-with-streaming-simd-extensions

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64