Questions tagged [simd]

Single instruction, multiple data (SIMD) is the concept of having each instruction operate on a small chunk or vector of data elements. CPU vector instruction sets include: x86 SSE and AVX, ARM NEON, and PowerPC AltiVec. To efficiently use SIMD instructions, data needs to be in structure-of-arrays form and should occur in longer streams. Naively "SIMD optimized" code frequently surprises by running slower than the original.

2540 questions

votes

2 answers

inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch

I am trying to compile a C program using cmake which uses SIMD intrinsics. When I try to compile it, I get two errors /usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:326:1: error: inlining failed in call to always_inline ‘_mm_mullo_epi32’:…

asked Mar 30 '17 at 21:29

Lawan subba

votes

2 answers

Do all 64 bit intel architectures support SSSE3/SSE4.1/SSE4.2 instructions?

I did searched on web and intel Software manual . But am unable to confirm if all Intel 64 architectures support upto SSSE3 or upto SSE4.1 or upto SSE4.2 or AVX etc. So that I would be able to use minimum SIMD supported instructions in my programme.…

x86-64 intel cpu-architecture simd

asked Jan 28 '15 at 06:14

Vikram Dattu

votes

3 answers

Fastest way to do horizontal vector sum with AVX instructions

I have a packed vector of four 64-bit floating-point values. I would like to get the sum of the vector's elements. With SSE (and using 32-bit floats) I could just do the following: v_sum = _mm_hadd_ps(v_sum, v_sum); v_sum = _mm_hadd_ps(v_sum,…

x86 sse simd avx vector-processing

asked Mar 19 '12 at 18:11

Luigi Castelli

votes

2 answers

Does browser JavaScript allow for SIMD or Vectorized operations?

I want to write applications in JavaScript that require a large amount of numerical computation. However, I'm very confused about the state of efficient linear-algebra-like computation in client-side JavaScript. There seems to be many approaches,…

javascript matrix vector vectorization simd

asked Mar 21 '17 at 02:51

Seanny123

8,776
13
68
124

votes

5 answers

Optimizing Array Compaction

Let's say I have an array k = [1 2 0 0 5 4 0] I can compute a mask as follows m = k > 0 = [1 1 0 0 1 1 0] Using only the mask m and the following operations Shift left / right And/Or Add/Subtract/Multiply I can compact k into the following [1 2 5…

algorithm matlab sse simd

asked Oct 25 '11 at 08:30

jameszhao00

7,213
15
62
112

votes

8 answers

c++ SSE SIMD framework

Does anyone know an open-source C++ x86 SIMD intrinsics library? Intel supplies exactly what I need in their integrated performance primitives library, but I can't use that because of the copyrights all over the place. EDIT I already know the…

c++ sse simd intrinsics

asked Feb 10 '11 at 03:42

user283145

votes

1 answer

Fastest way to compute absolute value using SSE

I am aware of 3 methods, but as far as I know, only the first 2 are generally used: Mask off the sign bit using andps or andnotps. Pros: One fast instruction if the mask is already in a register, which makes it perfect for doing this many times in…

x86 vectorization sse simd absolute-value

asked Sep 05 '15 at 01:29

Kumputer

votes

6 answers

How to use the Intel AVX in Java?

How do I use the Intel AVX vector instruction set from Java? It's a simple question but the answer seems to be hard to find.

java simd avx

asked Dec 27 '14 at 09:17

Albert Hendriks

1,979
3
25
45

votes

5 answers

Transpose an 8x8 float using AVX/AVX2

Transposing a 8x8 matrix can be achieved by making four 4x4 matrices, and transposing each of them. This is not want I'm going for. In another question, one answer gave a solution that would only require 24 instructions for an 8x8 matrix. However,…

simd avx avx2

asked Sep 02 '14 at 11:51

DavidS

1,660
1
12
26

votes

5 answers

How to combine two m128 values to m256?

I would like to combine two __m128 values to one __m256. Something like this: __m128 a = _mm_set_ps(1, 2, 3, 4); __m128 b = _mm_set_ps(5, 6, 7, 8); to something like: __m256 c = { 1, 2, 3, 4, 5, 6, 7, 8 }; are there any intrinsics that I can…

c x86 sse simd avx

asked Jun 20 '12 at 09:40

user1468756

votes

5 answers

SIMD prefix sum on Intel cpu

I need to implement a prefix sum algorithm and would need it to be as fast as possible. Ex: [3, 1, 7, 0, 4, 1, 6, 3] should give: [3, 4, 11, 11, 15, 16, 22, 25] Is there a way to do this using SSE SIMD CPU instruction? My first idea is to…

c++ sse simd prefix-sum

asked May 14 '12 at 16:44

skyde

2,816
4
34
53

votes

1 answer

IntStream leads to array elements being wrongly set to 0 (JVM Bug, Java 11)

In the class P below, the method test seems to return identically false: import java.util.function.IntPredicate; import java.util.stream.IntStream; public class P implements IntPredicate { private final static int SIZE = 33; @Override …

java arrays java-stream simd java-11

asked Dec 21 '20 at 14:51

p_i

votes

2 answers

Modern approach to making std::vector allocate aligned memory

The following question is related, however answers are old, and comment from user Marc Glisse suggests there are new approaches since C++17 to this problem that might not be adequately discussed. I'm trying to get aligned memory working properly for…

c++ c++17 stdvector simd memory-alignment

asked Feb 11 '20 at 13:19

Prunus Persica

1,173
9
27

votes

2 answers

Choice between aligned vs. unaligned x86 SIMD instructions

There are generally two types of SIMD instructions: A. Ones that work with aligned memory addresses, that will raise general-protection (#GP) exception if the address is not aligned on the operand size boundary: movaps xmm0, xmmword ptr…

x86 sse simd avx avx512

asked Sep 03 '18 at 09:57

MikeF

1,021
9
29

votes

2 answers

How to vectorize with gcc?

The v4 series of the gcc compiler can automatically vectorize loops using the SIMD processor on some modern CPUs, such as the AMD Athlon or Intel Pentium/Core chips. How is this done?

gcc compiler-optimization simd auto-vectorization vector-processing

asked Jan 03 '09 at 16:22

casualcoder

4,770
6
29
35

Prev 1 2 3

…

99 100 Next