Questions tagged [simd]

Single instruction, multiple data (SIMD) is the concept of having each instruction operate on a small chunk or vector of data elements. CPU vector instruction sets include: x86 SSE and AVX, ARM NEON, and PowerPC AltiVec. To efficiently use SIMD instructions, data needs to be in structure-of-arrays form and should occur in longer streams. Naively "SIMD optimized" code frequently surprises by running slower than the original.

2540 questions

votes

1 answer

How to achieve 8bit madd using SSE2

Reading from the official Intel C++ Intrinsic Reference, SSE 2 has the following command __m128i _mm_madd_epi16(__m128i a, __m128i b) Multiplies the 8 signed 16-bit integers from a by the 8 signed 16-bit integers from b. Adds the signed 32-bit…

asked Nov 13 '13 at 15:27

adkalkan

votes

1 answer

Build 2^n in double/simd

I am trying to build 2^n using the double representation. The trick is (well) known // tips to calculate 2^n using the exponent of the double IEEE representation union ieee754{ double d; uint32_t i[2]; }; // Converts an unsigned long long…

assembly simd

asked Nov 12 '13 at 20:13

Timocafé

votes

1 answer

SIMD integer store

I am writing a program using SSE instructions to multiply and add integer values. I did the same program with floats but I am missing an instruccion for my integer version. With floats, after I have finished all my operations, I return de values…

vectorization sse simd mmx sse3

asked Nov 03 '13 at 11:39

Thudor

votes

2 answers

When does data move around between SSE registers and the stack?

I'm not exactly sure what happens when I call _mm_load_ps? I mean I know I load an array of 4 floats into a __m128, which I can use to do SIMD accelerated arithmetic and then store them back, but isn't this __m128 data type still on the stack? I…

c++ sse simd cpu-registers register-allocation

asked Oct 29 '13 at 23:09

ulak blade

2,515
5
37
81

votes

1 answer

Reverse a string using SSE

How do we reverse a string using using SSE? this concept is new to me so please give me some information about it. The reason is because someone says using SSE will fasten up the code and run-time. I have searched for SSE which is _mm128 but don't…

c x86 sse simd

asked Oct 28 '13 at 04:25

Squall Leonahart

votes

1 answer

Where to initialize SSE constants

My question is about the most efficient place to define __m128/__m128i compile time constants in intrinsics based code. Considering two options: Option A __m128i Foo::DoMasking(const __m128i value) const { //defined in method const __m128i…

c++ constants sse simd

asked Oct 10 '13 at 08:09

Rotem

21,452
6
62
109

votes

2 answers

Extract 4 SSE integers to 4 chars

Suppose I have a __m128i containing 4 32-bit integer values. Is there some way I can store it inside a char[4], where the lower char from each int value is stored in a char value? Desired result: r1 r2 r3 …

c++ sse simd intrinsics

asked Oct 06 '13 at 19:38

Rotem

21,452
6
62
109

votes

1 answer

Can I store only 96 bit of 128 with SSE instructions?

_mm_store_ps stores (for example) 128 bit in a 4 float elements of an array. Can I store only 96 bit? or rather, only first 3 byte in 3 elements of array? (with SSE instuctions) I explained myself badly: I do not want to mask the bits. I would like…

c++ intel sse simd 128-bit

asked Sep 18 '13 at 12:14

user2120196

votes

1 answer

how to use arm neon vbit intrinsics?

I don't understand how I differentiate between vbit, vbsl and vbif with neon intrinsics. I need to do the vbit operation but if I use the vbslq instruction from the intrinsics I don't get what I want. For example I have a source vector like…

arm simd neon intrinsics

asked Sep 13 '13 at 10:55

user1926328

votes

1 answer

Unable to activate the SSE instruction set by "-march=native" in gcc or any other flags in Core2 chip

My machine is Core2 microarchitecture and I tried to compile some arithmetic code targeting the SSE instruction set. I searched on the web and official manual, and I believe that all I need to do is to add the flag -march=native, because my chip…

gcc sse simd mmx

asked Aug 31 '13 at 09:52

user2719257

votes

2 answers

Which registers do x86/x64 processors use for floating point math?

Does x86/x64 use SIMD register for high precision floating point operations or dedicated FP registers? I mean the high precision version, not regular double precision.

floating-point x86 64-bit simd cpu-registers

asked Aug 09 '13 at 00:23

user2341104

votes

3 answers

Speeding up Newton's Method for finding nth root

Let me predicate this question with a statement; This code works as intended but it is slow very very slow for what it is. Is there a way to make it the newton method converge faster or a way to set a __m256 var equal to a single float without…

c simd intrinsics avx

asked Jun 12 '13 at 21:01

Mercutio Calviary

votes

1 answer

_mm256_testz_pd not working?

I'm working on Core i7 on Linux and using g++ 4.63. I tried the following code: #include #include int main() { __m256d a = _mm256_set_pd(1,2,3,4); __m256d z = _mm256_setzero_pd(); std::cout << _mm256_testz_pd(a,a) <<…

x86 g++ intel simd avx

asked May 21 '13 at 21:33

Ming

votes

1 answer

Dynamically allocate SIMD Vector as array of doubles

I'm new to vectors and I've been having a read of the gcc documentation trying to get my head around it. Is it possible to dynamically allocate the size of a vector at run time? It appears as though you have to do this in the typedef like: typedef…

c vector simd

asked May 21 '13 at 05:07

samturner

2,213
5
25
31

votes

1 answer

Loading non-contiguous floats using SSE

Is there an Intel SSE instruction which can load floats from (non contiguous) evenly spaced memory addresses? For example given an array A = {0, 1, 2, 3 .... n}, I would like to load into a 128 bit register at once {A[0], A[4], A[8], A[12]},…

x86 intel sse simd

asked Apr 22 '13 at 18:27

jaynp

3,275
4
30
43

Prev 1 2 3

…

99 100 Next