Questions tagged [sse4]

Intel's Streaming SIMD Extensions 4 instruction set for x86 processors.

Intel's Streaming SIMD Extensions 4 instruction set for Intel Core architecture x86 processors and AMD's K10 x86 processors. It introduces 47 new SSE instructions in total.

These instructions encompass Intel's SSE4.1 and SSE4.2 instruction sets as well as AMD's SSE4a instruction set. More detailed information on the new instruction can be found in both Intel's and AMD's developer manuals or more conveniently on Wikipedia.

55 questions
1
vote
1 answer

fast compact register using sse

I am trying to figure out how to use sse _mm_shuffle_epi8 to compact a 128-bit register. Let's say, I have an input variable __m128i target which is basically 8 16-bits, indicated as: a[0], a[1] ... a[7]. // each slot is 16 bits my output is…
WhatABeautifulWorld
  • 3,198
  • 3
  • 22
  • 30
0
votes
0 answers

Auto-vectorization for hand-unrolled initialized tiled-computation versus simple loop with no initialization

In optimization for an AABB collision detection algorithm's inner-most 4-versus-4 comparison part, I am stuck at simplifying code at the same time gaining(or just retaining) performance. Here is the version with hand-unrolled…
huseyin tugrul buyukisik
  • 11,469
  • 4
  • 45
  • 97
0
votes
0 answers

Intel Intrinsics Comparing Two Strings

I am attempting to build a header parser with fast processing. I have two issues, one is that there is a bug in the code below. void parse_with_simd(const char *buffer, const int buffer_len) { const char * value = "GET "; __m128i u_str =…
0
votes
1 answer

Undefined intel_sse4_strlen

I am running into an issue. After I compiled my program with no problem, then I ran it and got an error that I could not figure out: I did "nm -u 64rm | grep intel" and got the following: How do I compile or what should I do for these API defined…
inflator
  • 39
  • 4
0
votes
1 answer

Is it beneficial to use glibc's strlen()/strcmp() or roll your own based on SSE4.2?

According to "Schema Validation with Intel® Streaming SIMD Extensions 4 (Intel® SSE4)" (Intel, 2008) [they] added instructions to assist in character searches and comparison on two operands of 16 bytes at a time. I wrote some basic strlen() and…
user1016031
  • 123
  • 1
  • 7
0
votes
1 answer

How does the _mm_cmpgt_epi64 intrinsic work

I'm using the _mm_cmpgt_epi64 intrinsic to implement a 128-bit addition, and later a 256-bit one. Looking at the result of this intrinsic something puzzles me. I don't understand why the computed mask is the way it is. const __m128i mask =…
elmattic
  • 12,046
  • 5
  • 43
  • 79
0
votes
1 answer

Using SSE4.2 instruction PCMPESTRM with small patterns

I am trying to use some SSE4.2 intructions in string matching algorithms, coded in c++. I do not understand how to use these instructions to match smaller patterns, and was hoping somebody could help me out with that. In the code example, I am…
cmperezg
  • 145
  • 11
0
votes
0 answers

Is _mm_load_ps a requirement for 128bit aligned structure?

I have a vector structure setup similar to this: It is 128bit aligned just like the __m128 type. struct Vector3 { union { float v[4]; struct { float x,y,z,w; } } } I am using the SSE 4.1 Dot product instruction…
Haydn Trigg
  • 105
  • 1
  • 6
0
votes
1 answer

How to get two strings char by char comparing table with SSE 4.2?

How to get two strings char by char comparing table with SSE 4.2 intrinsics in C? _mm_cmpistrm return mask of important bits, that is aggregating function on char by char comparing table processing result. __m128i _mm_cmpistrm ( __m128i a, …
udjin
  • 31
  • 2
-3
votes
4 answers

Is it reasonable to have SSE 4.2 on 64-bit processor?

SSE 4.2 perform comparation on two operands of 16 bytes at a time. But it is also possible to compare two operands of 8 bytes at a time with the ordinary processor instructions. Difference is not so large, to have the special hardvare realization of…
Nelson Tatius
  • 7,693
  • 8
  • 47
  • 70
1 2 3
4