Questions tagged [simd]

Single instruction, multiple data (SIMD) is the concept of having each instruction operate on a small chunk or vector of data elements. CPU vector instruction sets include: x86 SSE and AVX, ARM NEON, and PowerPC AltiVec. To efficiently use SIMD instructions, data needs to be in structure-of-arrays form and should occur in longer streams. Naively "SIMD optimized" code frequently surprises by running slower than the original.

2540 questions

votes

1 answer

Trying to add an __m128 using an and mask in SSE programming

I am trying to use the result of a compare operation to add to an SSE variable. I have just realised that when using the _mm_cmplt_ps operation if the result is true it returns a NAN because 0xffffffff can't be represented which is of no use to…

asked Apr 04 '13 at 10:23

user1850254

2,091
3
16
17

votes

1 answer

Using XMVECTOR from DirectXMath as a class member causes a crash only in Release Mode?

I've been trying to use XMVECTOR as a class member for a bounding box, since I do a lot of calculations, but I use the XMFLOAT3 only once per frame, so the bounding box has a method that gives me it's center in a XMFLOAT3, otherwise it stays in a…

c++ simd release-mode directxmath

asked Apr 01 '13 at 23:10

ulak blade

2,515
5
37
81

votes

4 answers

SIMD intrinsics - are they usable on gpus?

I'm wondering if I can use SIMD intrinsics in a GPU code like a CUDA's kernel or openCL one. Is that possible?

c++ cuda opencl simd

asked Feb 19 '13 at 13:45

Johnny Pauling

12,701
18
65
108

votes

1 answer

Neon VLD consuming more cycles than what is expected?

I have a simple asm code which loads 12 quad registers of NEON, and have paralleled pairwise add instruction along with the load instruction ( to exploit the dual issue capability). I have verified the code…

embedded arm simd neon cortex-a8

asked Feb 14 '13 at 07:21

nguns

votes

1 answer

Forcing automatic vectorization with GCC

Here my very simple question. With ICC I know it is possible to use #pragma SIMD to force vectorization of loops that the compiler chooses not to vectorize. Is there something analogous in GCC? Or, is there any plan to add this feature in a future…

c gcc vectorization simd auto-vectorization

asked Feb 06 '13 at 16:22

user2047635

votes

1 answer

xmm instructions - segmentation fault with memory source operand

I'm trying to add 4 numbers to other 4 numbers in assembly language with SSE2 instructions, using XMM registers. I did succeed, but I came over something I didn't understand. If I do the addition this way: movdqu xmm0, oword [var1] movdqu xmm1,…

assembly x86 sse simd memory-alignment

asked Dec 23 '12 at 20:53

Catalin Vasile

votes

2 answers

SSE operation on 4 arrays of integer size

Sorry for the previous non-descriptive question. Please allow me to rephrase the question again: The setup: I need to do ADD and some bit wise operations of 4 32-bit values from 4 arrays at the same time using SSE. All the element in these 4 arrays…

c assembly sse simd intrinsics

asked Nov 29 '12 at 21:01

fiftyplus

votes

3 answers

assembly intrinsic to do a masked load

int main() { const int STRIDE=2,SIZE=8192; int i=0; double u[SIZE][STRIDE]; #pragma vector aligned for(i=0;i

c assembly sse simd intrinsics

asked Nov 03 '12 at 00:12

arunmoezhi

3,082
6
35
54

votes

2 answers

Avoiding invalid memory load with SIMD instructions

I am loading elements from memory using SIMD load instructions, let say using Altivec, assuming aligned addresses: float X[SIZE]; vector float V0; unsigned FLOAT_VEC_SIZE = sizeof(vector float); for (int load_index =0; load_index < SIZE;…

simd altivec

asked Oct 23 '12 at 11:27

fsheikh

votes

2 answers

Fast Saturate and shift two Halfwords in ARM asm

I have two signed 16-bit values in a 32-bit word, and I need to shift them right (divide) on constant value (it can be from 1 to 6) and saturate to byte (0..0xFF). For example, 0x FFE1 00AA with shift=5 must become 0x 0000 0005; 0x 2345 1234 must…

optimization assembly arm bit-manipulation simd

asked Aug 16 '09 at 16:28

zxcat

2,054
3
26
40

votes

0 answers

SSE floating point dot product for dummies

I have read many SO questions about SSE/SIMD (e.g., Getting started with SSE), but I'm still confused by all of it. All I want is a dot product between two double precision floating-point vectors, in C (C99 FWIW). I'm using GCC. Can someone post a…

gcc sse simd dot-product

asked Oct 05 '12 at 03:33

purple51

votes

1 answer

Are arrays initialized like `float[10][10]` already memory aligned for SIMD/SSE?

I need to optimize my matrix multiplication by using SIMD/Intel SSE. The example code given looks like: *x = (float*)memalign(16, size * sizeof(float)); However, I am using C++ and [found that][1] I instead of malloc (before doing SIMD), I should…

c++ sse simd

asked Oct 03 '12 at 13:33

Jiew Meng

84,767
185
495
805

votes

1 answer

ROS (Robot Operating System) with SSSE3 flag

I started working with ROS lately and got stuck on one problem. I need to use some classes whick require SSE2, SSE3 and SSSE3 CPU extensions. I tried to edit the manifest.xml file of my ROS Package like

x86 simd sse2 ros sse3

asked Sep 29 '12 at 20:24

SolvedForHome

votes

1 answer

Is it possible to execute MIMD with OpenCL framework?

Soon enough we will have nVidia GTX 300 that would be able to execute multiple instrucions on multiple data (MIMD). I wonder if OpenCL can execute MIMD?

parallel-processing opencl nvidia simd

asked Jul 31 '09 at 21:27

Roman Kagan

10,440
26
86
126

votes

1 answer

How to align 16-bit ints for use with SSE intrinsics

I am working with two-dimensional arrays of 16-bit integers defined as int16_t e[MAX_SIZE*MAX_NODE][MAX_SIZE]; int16_t C[MAX_SIZE][MAX_SIZE]; Where Max_SIZE and MAX_NODE are constant values. I'm not a professional programmer, but somehow with the…

c sse simd memory-alignment sse2

asked Jun 16 '12 at 21:31

SMir

Prev 1 2 3

…

99 100 Next