Questions tagged [sse2]

x86 Streaming SIMD Extensions 2 adds support for packed integer and double-precision floats in the 128-byte XMM vector registers. It is always supported on x86-64, and supported on every x86 CPU from 2003 or later.

See the x86 tag wiki for guides and other resources for programming and optimising programs using x86 vector extensions, and the SSE tag wiki for other SSE- and SSE2-related resources.

SSE2 is one of the SSE family of x86 instruction-set extensions.

SSE2 adds support for double-precision floating point, and packed-integer (8bit to 64bit elements) in XMM registers. It is baseline in x86-64, so 64bit code can always assume SSE2 support, without having to check. 32bit code could still be run on a CPU from before 2003 (Athlon XP or Pentium III) that didn't support SSE2, but this is unlikely for most newly-written code. (And so an MMX or original-SSE fallback is not worth writing.)

Most tasks that benefit from vectors at all can be done fairly efficiently using only instructions up to SSE2. This is fortunate, because widespread support for later SSE versions took time. Use of later SSE extensions typically saves a couple instructions here and there, usually with only minor speed-ups. Notably absent until SSSE3 was PSHUFB, a shuffle whose operation was controlled by elements in a register, rather than a compile-time constant imm8. It can do things that SSE2 can't do efficiently at all.

AVX provides 3-operand versions of all SSE2 instructions.

History

Intel introduced SSE2 with their Pentium 4 design in 2001.

SSE2 was adopted by AMD for its 64bit CPU line in 2003/2004. As of 2009 there remain few if any x86 CPUs (at least, in any significant numbers) that do not support the SSE2 instruction set, which makes it extremely attractive on the Windows PC platform by offering a large feature set that can practically be assumed a "minimum requirement" that will be omnipresent (which, however, at least in 32bit mode, does not remove the necessity to check processor features).

More recent instruction sets introduce fewer features which are often highly specialized, and are at the same time supported inconsistenly between manufacturers by a significantly smaller share of processors (10-50% in 2009).

SSE2 does not offer instructions for horizontal addition, which are needed for some geometric calculations (e.g. dot product) and complex arithmetic. This functionality has to be emulated with one or several shuffles, which however are often not significantly slower than the dedicated instructions in higher revisions.

275 questions

votes

1 answer

boost::shared_array and aligned memory allocation

In Visual C++, I'm trying to dynamically allocate some memory which is 16-byte aligned so I can use SSE2 functions that require memory alignment. Right now this is how I allocate the memory: boost::shared_array aData(new unsigned…

asked Oct 22 '10 at 17:33

Warpin

6,971
12
51
77

votes

1 answer

Assembly "movdqa" access violation

I am currently trying to write a function in assembly and i want to move 128 bits of a string located at the memory address stored in rdx into the xmm1 register. If i use movdqa xmm1, [rdx], i get a access violation exception while reading at…

assembly masm sse2

asked Oct 11 '16 at 20:06

Ben

votes

1 answer

Scaling of a complex vector using SSE

I want to apply SSE instructions to a vector containing complex numbers. Without SSE instructions, I can do it with the following code. However, when I apply SSE instructions, I don't know how to get the calculated real and imaginary part back to…

c x86 sse simd sse2

asked May 04 '16 at 15:53

Nils

votes

1 answer

De-interleave image channel in SSE 16 bit vectors

byte I have 32 bpp image. I need to de interleave R G B color channels in diferent 16 bits vectors i am using following code to do that( how to deinterleave image channel in SSE) // deinterleave chaneel R, G, B ,A in 16 bits vectors { …

x86 sse simd intrinsics sse2

asked Mar 09 '16 at 16:53

Bharat Ahuja

votes

1 answer

how to deinterleave image channel in SSE

is there any way we can DE-interleave 32bpp image channels similar as below code in neon. //Read all r,g,b,a pixels into 4 registers uint8x8x4_t SrcPixels8x8x4= vld4_u8(inPixel32); ChannelR1_32x4 =…

image-processing sse simd sse2

asked Mar 08 '16 at 15:38

Bharat Ahuja

votes

1 answer

Clang-cl fails to build NSS lib due to emmintrin.h even with -msse2 flag

The freebl library in NSS fails to build properly (as a part of Firefox) due to emmintrin.h header from Clang 3.7 throwing errors that I'd assume were due to a missing -msse2 flag. Even with this flag, the source file that calls this header…

c++ c clang sse2 nss

asked Jun 29 '15 at 12:22

SRG3006

votes

1 answer

SSE Sum of multiplication of 4 32-bit integers

Thanks to this post I found out how to multiply 4 32-bit integers. What I want to do now is sum up the results. How can I do this using intrinsics? I've got access to SSE, SSE2 and AVX. My initial thoughts were to unload res into an int array and…

c sse simd avx sse2

asked May 17 '15 at 15:47

Harrold

votes

2 answers

Assembly "dec" instruction for XMM

I'm currently passing a an external parameter from C to ASM using the following: myFunction proc myVar:qword public myFunction movdqu xmm3,oword ptr myVar myFunction endp Ultimately, I want to something similar to the below but first need…

assembly masm sse2 sse

asked Mar 19 '15 at 04:04

Elegant

votes

1 answer

uint64 array to uint128 for SSE2

I have two similar issues when handling arrays when defined in the asm and when passed from c++ to asm. The code works fine inline but I need to separate them from the cpp into an asm file. The compiler may not throw an error or warning but the end…

c++ arrays assembly sse2 uint64

asked Mar 07 '15 at 19:48

Elegant

votes

1 answer

Porting code frag from MMX to SSE2 asm

I'm trying to port some code from MMX to SSE2 and having a bit of trouble in doing so. For MMX I have: .data align 16 onesByte qword 2 dup(0101010101010101h) ... psubusb mm2,onesByte psubusb mm0,onesByte For SSE2 I have: …

assembly masm sse2 mmx

asked Mar 07 '15 at 00:19

Elegant

votes

1 answer

How to examine a 256i (16-bit) vector to know if it contains any element greater than zero?

I am converting a vectorized code from SSE2 intrinsics to AVX2 intrinsics, and would like to know how to check if a 256i (16-bit) vector contains any element greater than zero or not. Below is the code used in the SSE2: int check2(__m128i vector1,…

c simd sse2 avx2

asked Feb 23 '15 at 23:17

MROF

votes

4 answers

implement SIMD in C++

I'm working on a bit of code and I'm trying to optimize it as much as possible, basically get it running under a certain time limit. The following makes the call... static affinity_partitioner ap; parallel_for(blocked_range(0, T),…

c++ simd sse2

asked Apr 29 '10 at 16:53

Hristo

45,559
65
163
230

votes

1 answer

SSE2 intrinsics - comparing 2 __m128i's containing 4 int32's each to see how many are equal

I'm diving in SSE2 intrinsics for the first time and I'm not sure how to do this. I want to compare 4 int32's to 4 other int32's and count how many are equal. So I read my first 4 int32's, set them in a __m128i, do the same for the second set, and…

count comparison sse intrinsics sse2

asked Jul 08 '14 at 19:09

Pygmy

1,268
17
33

votes

1 answer

Visual Studio 2013 express SSE2 disable

I tried to rebuild a MSVC 2013 project with disabled sse2 features but it didn't helped.Should i rebuild glew and GLFW libraries that are used?The project is motogame,a part of motocoin http://motocoin.org/ .I can't run this game because my…

visual-c++ sse2

asked Jun 10 '14 at 17:33

N00b

votes

0 answers

_mm_load_si128 - Passed memory address is not 16-byte-aligned?

I've got some trouble understanding a SSE2-instruction. According to the microsoft documentation, _mm_load_si128 requires a 16-byte-aligned address as parameter. In the code, which I try to understand, this seems not to be the case: void f(uchar*…

c++ simd sse2

asked Apr 16 '14 at 10:39

Simon Oelmann

Prev 1 2 3

…

18 19 Next