Questions tagged [sse2]

x86 Streaming SIMD Extensions 2 adds support for packed integer and double-precision floats in the 128-byte XMM vector registers. It is always supported on x86-64, and supported on every x86 CPU from 2003 or later.

See the x86 tag wiki for guides and other resources for programming and optimising programs using x86 vector extensions, and the SSE tag wiki for other SSE- and SSE2-related resources.


SSE2 is one of the SSE family of x86 instruction-set extensions.

SSE2 adds support for double-precision floating point, and packed-integer (8bit to 64bit elements) in XMM registers. It is baseline in x86-64, so 64bit code can always assume SSE2 support, without having to check. 32bit code could still be run on a CPU from before 2003 (Athlon XP or Pentium III) that didn't support SSE2, but this is unlikely for most newly-written code. (And so an MMX or original-SSE fallback is not worth writing.)

Most tasks that benefit from vectors at all can be done fairly efficiently using only instructions up to SSE2. This is fortunate, because widespread support for later SSE versions took time. Use of later SSE extensions typically saves a couple instructions here and there, usually with only minor speed-ups. Notably absent until SSSE3 was PSHUFB, a shuffle whose operation was controlled by elements in a register, rather than a compile-time constant imm8. It can do things that SSE2 can't do efficiently at all.

AVX provides 3-operand versions of all SSE2 instructions.

History

Intel introduced SSE2 with their Pentium 4 design in 2001.

SSE2 was adopted by AMD for its 64bit CPU line in 2003/2004. As of 2009 there remain few if any x86 CPUs (at least, in any significant numbers) that do not support the SSE2 instruction set, which makes it extremely attractive on the Windows PC platform by offering a large feature set that can practically be assumed a "minimum requirement" that will be omnipresent (which, however, at least in 32bit mode, does not remove the necessity to check processor features).

More recent instruction sets introduce fewer features which are often highly specialized, and are at the same time supported inconsistenly between manufacturers by a significantly smaller share of processors (10-50% in 2009).

SSE2 does not offer instructions for horizontal addition, which are needed for some geometric calculations (e.g. dot product) and complex arithmetic. This functionality has to be emulated with one or several shuffles, which however are often not significantly slower than the dedicated instructions in higher revisions.

275 questions
-1
votes
1 answer

SSE2 code optimization to compress an image

I want to optimize the for loop with SSE/SSE2 instructions for a better time in image compression. size_t height = get_height(); size_t width = get_width(); size_t total_size = height * width * 3; uint8_t *src = get_pixels(); uint8_t *dst = new…
casian
  • 1
-1
votes
1 answer

Determine whether eigen has optimized code for SSE instructions or not

I am having a code which is using Eigen::vectors, I want to confirm that Eigen has optimized this code for SSE or not. I am using Visual Studio 2012 Express, in which i can set the command line option "/Qvec-report:2" which gives the optimization…
-1
votes
1 answer

Strange SIMD instruction behavior

SSE2 instruction (paddd xmm, m128) works really strange. Code tells all. #include using namespace std; int main() { int * v0 = new int [80]; for (int i=0; i<80; ++i) v0[i] = i; int * v1 = new int [80]; for…
dev1223
  • 1,148
  • 13
  • 28
-2
votes
1 answer

What are some rules of thumb for when SIMD would be faster? (SSE2, AVX)

I have some code that operates on 3 symmetric sets of 3 asymmetric integer values at a time. There is a significant amount of conditional code and lots of constants. This has become a perf bottleneck and I'm looking for some rules of thumb for…
Tumbleweed53
  • 1,491
  • 7
  • 13
-4
votes
1 answer

Two 32-bit signed integers Multiplication using SSE2

How can I multiply two signed 32-bit integers using SSE2 instruction set?
user2003619
  • 49
  • 1
  • 4
1 2 3
18
19