Questions tagged [avx]

Advanced Vector Extensions (AVX) is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD.

AVX provides a new encoding for all previous Intel SSE instructions, giving 3-operand non-destructive operation. It also introduces double-width ymm vector registers, and some new instructions for manipulating them. The floating point vector instructions have 256b versions in AVX, but 256b integer instructions require AVX2. AVX2 also introduced lane-crossing floating-point shuffles.

Mixing AVX (vex-encoded) and non-AVX (old SSE encoding) instructions in the same program requires careful use of VZEROUPPER on Intel CPUs, to avoid a major performance problem. This has led to several performance questions where this was the answer.

Another pitfall for beginners is that most 256b instructions operate on two 128b lanes, rather than treating a ymm register as one long vector. Carefully study which element moves where when using UNPCKLPS and other shuffle / horizontal instructions.

See the x86 tag page for guides and other resources for programming and optimising programs using AVX. See the SSE tag wiki for some guides to SIMD programming techniques, rather than just instruction-set references.

See also Crunching Numbers with AVX and AVX2 for an intro to using AVX intrinsics, with simple examples.


Interesting Q&As / FAQs:

1252 questions
0
votes
0 answers

clang in Xcode 7.2 generates vxorps

I encountered an issue where compiling cryptopp with clang from Xcode 7.2 generates a vxorps instruction in ByteQueue::ByteQueue(unsigned long). Since our product can be run on old CPUs where this instruction triggers illegal instruction I need to…
Rudolfs Bundulis
  • 11,636
  • 6
  • 33
  • 71
0
votes
1 answer

How to extract an array of properties out of an array of objects?

Imagine that i have an array of objects, like this: class Segment { public: float x1, x2, y1, y2; } Segment* SegmentList[100]; Based on this array of Segments, I want to quickly extract its properties and create vectors with all the x1, x2, y1…
Alkin
  • 1
  • 1
0
votes
0 answers

Anaconda Tensorflow Compiler Issue CPU AVX AVX2

I installed Tensorflow via Anaconda, I tried testing if it works using the short program on the website, but I ended up with this error. Is there something wrong, or is it my CPU can't handle it? Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018,…
0
votes
0 answers

Is there a penalty for mixing x86-64 integer instructions with AVX1/2/512 instructions?

I have seen a lot of assembly with AVX(all three flavors), and in all the cases that I have seen the most concentrated a kind of instruction is the best the code performs. But, for example, things like doing a load into a 32-bit register and then…
JLV
  • 1
0
votes
0 answers

How to extract data from xmm register by index stored in another register r10 to store that dword in eax?

I need to extract dword from XMM1, which is located at index, stored in R10, and move it to EAX register. How to do that efficiently, not involving memory access? Following would not compile: PEXTRD EAX,XMM1,R10d
xakepp35
  • 2,878
  • 7
  • 26
  • 54
0
votes
1 answer

SSE/AVX - VMULPD produces all zeros for small integer inputs?

I'm using X64dbg to test SSE/AVX assembly instructions to better understand their behavior before using them to write code. I've been able to test the vmovapd, vbroadcastsd, vsubpd, and vaddpd instructions this way without issue. I loaded YMM…
Gogeta70
  • 881
  • 1
  • 9
  • 23
0
votes
0 answers

C AVX2 sum array horizontaly

I have some problems with AVX2 instructions. I wrote a program in c which read a binary file with unsigned chars then sum them. Now i want to replace the c for loop with AVX2 instructions but it doesnt work. Thats the first time i want to use AVX2.…
AsdFork
  • 15
  • 5
0
votes
2 answers

Why does this AVX intrinsic cause "Segmentation fault" with clang, but not GCC?

It seems two functions below can cause segmentation fault when compiled with clang using -mavx (or -march=sandybridge -> skylake). void _mm256_mul_double_intrin(double* a, double* b, int N) { int nb_iters = N / ( sizeof(__m256d) / sizeof(double)…
sandthorn
  • 2,770
  • 1
  • 15
  • 59
0
votes
1 answer

Linker error GCC7 with -mavx flag

compiling 256 bit vector datatype (__m256d) from Intel's AVX extension with gcc7 or clang fails. I am able to compile and use 128 bit vectors (without -mavx flag). But as soon as I try the avx vectors either some assembler command definitions are…
0
votes
1 answer

Conda install dlib AVX support

I've just installed dlib using conda from the conda-forge channel. Is it possibile to know whether it has been built with AVX support?
se7entyse7en
  • 4,310
  • 7
  • 33
  • 50
0
votes
0 answers

Which x86 ISA extensions imply support for previous SIMD extensions?

My CPU supports the following technologies: MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, and AVX. When I write my code and check for hardware support, can I assume things like "If the processor supports AVX, it ALWAYS supports all of these other…
HesNotTheStig
  • 549
  • 1
  • 4
  • 9
0
votes
0 answers

AVX Command Error for integer addition

Has anyone know how to resolve these types of error? I am trying to add two 256-bit integer vector, but getting following error: cpu_avx.c:12:20: error: incompatible types when initializing type ‘__m256i’ using type ‘int’ __m256i result =…
Sagar
  • 1
  • 1
0
votes
0 answers

Why this AVX code so slow?

Well, the code is, and question is why AVX version is more slower than naive variant ? const double __declspec(align(16)) mx[4] = { 1., 1., 1., -100.}; const double __declspec(align(16)) an[8] = { 8., 7., 6., 5., 4., 3., 2., 1.}; __forceinline…
Des Spigel
  • 19
  • 4
0
votes
1 answer

Inside virtualenv: How to get tensorflow to support sse 4.2 and avx

Just to say it upfront, I'm aware of all the answers that require bazel and they didn't work for me. I'm using virtualenv as the tensorflow website recommends to. (tensorflow27)name@computersname:~$ bazel build --linkopt='-lrt' -c opt --copt=-mavx…
evolution
  • 593
  • 6
  • 20
0
votes
1 answer

SIMD -> uint16_t array to float array work on float then back to uint16_t

I am currently working on a project that manipulates images. To speed up the process (and increase my knowledge), I decided to write some of the basic functions using SIMD instructions. The code using for loops is int idx; uint16_t* A, B, C; float…