Questions tagged [simd]

Single instruction, multiple data (SIMD) is the concept of having each instruction operate on a small chunk or vector of data elements. CPU vector instruction sets include: x86 SSE and AVX, ARM NEON, and PowerPC AltiVec. To efficiently use SIMD instructions, data needs to be in structure-of-arrays form and should occur in longer streams. Naively "SIMD optimized" code frequently surprises by running slower than the original.

2540 questions

votes

1 answer

How to compare two vectors using SIMD and get a single boolean result?

I have two vectors of 4 integers each and I'd like to use a SIMD command to compare them (say generate a result vector where each entry is 0 or 1 according to the result of the comparison). Then, I'd like to compare the result vector to a vector of…

asked Jul 29 '11 at 15:04

N.M

votes

4 answers

Any Lisp extensions for CUDA?

I just noted that one of the first languages for the Connection-Machine of W.D. Hillis was *Lisp, an extension of Common Lisp with parallel constructs. The Connection-Machine was a massively parallel computer with SIMD architecture, much the same as…

lisp cuda parallel-processing gpgpu simd

asked May 18 '11 at 15:18

Halberdier

1,164
11
15

votes

3 answers

Why does does SSE set (_mm_set_ps) reverse the order of arguments

I recently noticed that _m128 m = _mm_set_ps(0,1,2,3); puts the 4 floats into reverse order when cast to a float array: (float*) p = (float*)(&m); // p[0] == 3 // p[1] == 2 // p[2] == 1 // p[3] == 0 The same happens with a union { _m128 m;…

c++ c simd sse intrinsics

asked Mar 08 '11 at 20:30

Inverse

4,408
2
26
35

votes

1 answer

Do I get a performance penalty when mixing SSE integer/float SIMD instructions

I've used x86 SIMD instructions (SSE1234) in the form of intrinsics quite a lot lately. What I found frustrating is that the SSE ISA has several simple instructions that are available only for floats or only for integers, but in theory should…

c assembly sse simd intrinsics

asked Feb 14 '11 at 19:28

user283145

votes

1 answer

How do the Conflict Detection instructions make it easier to vectorize loops?

The AVX512CD instruction families are: VPCONFLICT, VPLZCNT and VPBROADCASTM. The Wikipedia section about these instruction says: The instructions in AVX-512 conflict detection (AVX-512CD) are designed to help efficiently calculate conflict-free…

x86 vectorization simd intel-mic avx512

asked Oct 07 '16 at 09:17

zr.

7,528
11
50
84

votes

2 answers

Does R leverage SIMD when doing vectorized calculations?

Given a dataframe like this in R: +---+---+ | X | Y | +---+---+ | 1 | 2 | | 2 | 4 | | 4 | 5 | +---+---+ If a vectorized operation is performed on this dataframe, like so: data$Z <- data$X * data$Y Will this leverage the processor's…

r vectorization simd

asked May 13 '16 at 14:45

Jochen van Wylick

5,303
4
42
64

votes

2 answers

Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

Does anyone know why GCC/Clang will not optimist function test1 in the below code sample to simply use just the RCPPS instruction when using the fast-math option? Is there another compiler flag that would generate this code? typedef float float4…

c++ sse compiler-optimization simd fast-math

asked Aug 14 '15 at 04:26

Chris_F

4,991
5
33
63

votes

3 answers

practical BigNum AVX/SSE possible?

SSE/AVX registers could be viewed as integer or floating point BigNums. That is, one could neglect that there exist lanes at all. Does there exist an easy way to exploit this point of view and use these registers as BigNums either singly or…

sse biginteger simd avx extended-precision

asked Jan 13 '15 at 13:22

user1095108

14,119
9
58
116

votes

1 answer

Why can't I specify the calling convention for a constructor(C++)?

In Visual Studio 2013 a new calling convention _vectorcall exists. It is intended for usage with SSE data types that can be passed in SSE registers. You can specify the calling convention of a member functions like so. struct Vector{//a 16 byte…

c++ visual-c++ visual-studio-2013 simd calling-convention

asked Mar 30 '14 at 20:09

Froglegs

1,095
1
11
21

votes

5 answers

Fast Vector Math in .NET - What are the options?

My 3D graphics software, written in C# using SlimDX, does a lot of vector operations on the CPU. (In this specific situation, it is not possible to offload the work to the GPU). How can I make my vector math faster? So far, I have found these…

c# .net sse simd slimdx

asked Mar 30 '13 at 21:33

LTR

1,226
2
17
39

votes

3 answers

How to dump all the XMM registers in gdb?

I can dump the all the integer registers in gdb with just: info registers for the xmm registers (intel) I need a file like: print $xmm0 print $xmm1 ... print $xmm15 and then source that file. Is there an easier way?

x86 gdb simd sse cpu-registers

asked Mar 30 '12 at 19:52

Peeter Joot

7,848
7
48
82

votes

2 answers

_mm_load_ps vs. _mm_load_pd vs. etc on Intel x86 ISA

What's the difference between the following two lines? __m128 x = _mm_load_ps((float *) ptr); __m128 y = _mm_load_pd((double *)ptr); In other words, why are there so many different _mm_load_xyz instructions, instead of a generic __m128…

c x86 intel sse simd

asked Jan 13 '12 at 19:22

user541686

205,094
128
528
886

votes

2 answers

Explaining the different types in Metal and SIMD

When working with Metal, I find there's a bewildering number of types and it's not always clear to me which type I should be using in which context. In Apple's Metal Shading Language Specification, there's a pretty clear table of which types are…

objective-c macos cocoa simd metal

asked Aug 10 '18 at 16:20

kennyc

5,490
5
34
57

votes

1 answer

Does compiler use SSE instructions for a regular C code?

I see people using -msse -msse2 -mfpmath=sse flags by default hoping that this will improve performance. I know that SSE gets engaged when special vector types are used in the C code. But do these flags make any difference for regular C code? Does…

c compiler-optimization simd sse auto-vectorization

asked Jun 10 '18 at 17:25

Jennifer M.

1,398
1
9
11

votes

1 answer

what's the difference between _mm256_lddqu_si256 and _mm256_loadu_si256

I had been using _mm256_lddqu_si256 based on an example I found online. Later I discovered _mm256_loadu_si256. The Intel Intrinsics guide only states that the lddqu version may perform better when crossing a cache line boundary. What might be the…

x86 simd intrinsics avx micro-optimization

asked Nov 22 '17 at 02:26

Jimbo

2,886
2
29
45

Prev 1 2 3

…

99 100 Next