Questions tagged [sse4]

Intel's Streaming SIMD Extensions 4 instruction set for x86 processors.

Intel's Streaming SIMD Extensions 4 instruction set for Intel Core architecture x86 processors and AMD's K10 x86 processors. It introduces 47 new SSE instructions in total.

These instructions encompass Intel's SSE4.1 and SSE4.2 instruction sets as well as AMD's SSE4a instruction set. More detailed information on the new instruction can be found in both Intel's and AMD's developer manuals or more conveniently on Wikipedia.

55 questions
6
votes
3 answers

Compare strings by SSE4 wrappers

I need to quickly compare two string on the machine with SSE4 support. How can I do it without writing assembler inserts? Some wrappers like long long bitmask = strcmp(char* a, char* b) would be perfect.
Nelson Tatius
  • 7,693
  • 8
  • 47
  • 70
5
votes
3 answers

How to simulate pcmpgtq on sse2?

PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4.2? Update: This same question applies to ARMv7 with…
Dan Weber
  • 401
  • 2
  • 9
5
votes
3 answers

Does .NET Framework 4.5 provide SSE4/AVX support?

I think, I heard about that, but don't know where. upd: I told about JiT
Arman Hayots
  • 2,459
  • 6
  • 29
  • 53
4
votes
2 answers

SSE4.1 unsigned integer comparison with overflow

Is there any way to perform a comparison like C >= (A + B) with SSE2/4.1 instructions considering 16 bit unsigned addition (_mm_add_epi16()) can overflow? The code snippet looks like- #define _mm_cmpge_epu16(a, b) _mm_cmpeq_epi16(_mm_max_epu16(a,…
Kaustubh
  • 73
  • 4
4
votes
2 answers

How do I enable SSE4.1 and SSE3 (but NOT AVX) in MSVC

I am trying to enable different simd support using MSVC. There is a page talking about enabling some simd, such as SSE2, AVX, AVX2 https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?redirectedfrom=MSDN&view=vs-2019 However, it does…
knightyangpku
  • 75
  • 1
  • 4
4
votes
2 answers

Does a processor that supports SSE4 support SSSE3 instructions?

I am developing a hardware platform that requires the SSSE3 instruction set. When looking at a processor such as the Intel Atom® x5-Z8350 the datasheet says it has support for SSE4.1 and SSE4.2. Would this allow software written for SSSE3…
Eric Johnson
  • 205
  • 2
  • 14
4
votes
2 answers

How do I enable the SSE4.2 instruction set in Visual C++?

I am using the BRIEF descriptor in OpenCV in Visual C++ 2010 to match points in two images. In the paper about the BRIEF-descriptor is written that it is possible to speed up things: "The BRIEF descriptor uses hamming distance, which can be done…
Fredrik
  • 141
  • 2
  • 5
3
votes
1 answer

Intrinsic inverse to _mm_movemask_epi8

So first I'll just describe the task: I need to: Compare two __m128i. Somehow do the bitwise and of the result with a certain uint16_t value (probably using _mm_movemask_epi8 first and then just &). Do the blend of the initial values based on the…
Andrew S.
  • 467
  • 3
  • 12
3
votes
1 answer

Is there a way to cast integers to bytes, knowing these ints are in range of bytes. Using SSE?

In an xmm register I have 3 integers with values less than 256. I want to cast these to bytes, and save them to memory. I don't know how to approach it. I was thinking about getting those numbers from xmm1 and saving them to eax, then moving the…
thomas113412
  • 67
  • 1
  • 4
3
votes
3 answers

SSE mov instruction that can skip every 2nd byte?

I need to copy all the odd numbered bytes from one memory location to another. i.e. copy the first, third, fifth etc. Specifically I'm copying from the text area 0xB8000 which contains 2000 character/attribute words. I want to skip the attribute…
poby
  • 1,572
  • 15
  • 39
2
votes
1 answer

What does "SSE 4.2 insanity" mean in the "if consteval" proposal paper?

I was reading a C++ paper on if consteval (§3.2), and saw a code showing a constexpr strlen implementation: constexpr size_t strlen(char const* s) { if constexpr (std::is_constant_evaluated()) { for (const char *p = s; ; ++p) { …
Chi_Iroh
  • 1,061
  • 5
  • 14
2
votes
1 answer

How can I get gcc to vectorize code using the SSE4.1 pminuq/pminud/etc opcodes?

I've been using the excellent godbolt.org to determine what gcc does and doesn't vectorize: but I can't work out any way of getting it to vectorize a min(X,Y) function into a PMINUQ etc. Looking at the sse.md machine description language file in the…
nickpelling
  • 119
  • 9
2
votes
1 answer

How to change the CPU instruction set which VirtualBox emulated for guest OS, like disabling SSE4.2 instruction set?

What I want to achieve is disabling SSE4.2 instruction set for CPU which VirtualBox emulated for my Linux guest OS for debugging purpose, even though the real CPU support SSE4.2 instruction set on which VirtualBox is based. I referred to the…
cong
  • 1,105
  • 1
  • 12
  • 29
2
votes
1 answer

How much faster are SSE4.2 string instructions than SSE2 for memcmp?

Here is my code's assembler Can you embed it in c ++ and check against SSE4? At speed I would very much like to see how stepped into the development of SSE4. Or is not worried about him at all? Let's check (I do not have support above SSSE3) { sse2…
Exile
  • 29
  • 1
  • 3
2
votes
1 answer

how to copy bytes into xmm0 register

I have the following code which works fine but seems inefficient given the end result only requiring the data in xmm0 mov rcx, 16 ; get first word, up to 16 bytes mov rdi, CMD ; ...and put…
poby
  • 1,572
  • 15
  • 39