Questions tagged [avx]

Advanced Vector Extensions (AVX) is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD.

AVX provides a new encoding for all previous Intel SSE instructions, giving 3-operand non-destructive operation. It also introduces double-width ymm vector registers, and some new instructions for manipulating them. The floating point vector instructions have 256b versions in AVX, but 256b integer instructions require AVX2. AVX2 also introduced lane-crossing floating-point shuffles.

Mixing AVX (vex-encoded) and non-AVX (old SSE encoding) instructions in the same program requires careful use of VZEROUPPER on Intel CPUs, to avoid a major performance problem. This has led to several performance questions where this was the answer.

Another pitfall for beginners is that most 256b instructions operate on two 128b lanes, rather than treating a ymm register as one long vector. Carefully study which element moves where when using UNPCKLPS and other shuffle / horizontal instructions.

See the x86 tag page for guides and other resources for programming and optimising programs using AVX. See the SSE tag wiki for some guides to SIMD programming techniques, rather than just instruction-set references.

See also Crunching Numbers with AVX and AVX2 for an intro to using AVX intrinsics, with simple examples.


Interesting Q&As / FAQs:

1252 questions
0
votes
1 answer

GCC inline SSE code

Something bugs me regarding the vector extensions. The document: Intel® Advanced Vector Extensions Programming Reference States: VPSRLD ymm1, ymm2, imm8 So I went ahead and: __asm__ ( "vpsrld %ymm0, %ymm0, $0x4" ); GCC 4.8.2-19ubuntu1 spits…
Anders Cedronius
  • 2,036
  • 1
  • 23
  • 29
0
votes
1 answer

Using intrinsics to find next non-zero in an array

I have an int array[10000] and I want to iterate from a certain position to find the next non-zero index. Currently I use a basic while loop: while(array[i] == 0){ pos++; } etc I know with intrinsics I could test 4 integers for zero at a time,…
user997112
  • 29,025
  • 43
  • 182
  • 361
0
votes
1 answer

Check for zeros horizontally across __m128i vector?

I have several __m128i vectors containing 32-bit unsigned integers and I would like to check whether any of the 4 integers is a zero. I understand how I can "aggregate" the multiple __m128i vectors but eventually I will still end up with a single…
user997112
  • 29,025
  • 43
  • 182
  • 361
0
votes
0 answers

Implement Multiply and adding 2 matrix by avx programming

I want to implement multiply and adding 2 matrices in Visual C++ 2012 using AVX. I enable AVX(Advanced Vector Extensions (/arch:AVX)) in Visual studio. But for adding matrices when I enable this property and when I disable it, the time is same and…
user2855778
  • 137
  • 3
  • 19
0
votes
1 answer

How to specify the CFLAGS to gcc-4.6 or gcc-4.7 to use the Intel-AVX

I have an Intel Core i7-3770, and I found that it contains the AVX, How do I specify the CFLAGS to gcc-4.6 or gcc-4.7 to use the Intel-AVX? Is there some example code or manual about this? Thanks.
mining
  • 3,557
  • 5
  • 39
  • 66
0
votes
2 answers

Using AVX with GCC: __builtin_ia32_addpd256 not declared

If I #include I get this error: error: '__builtin_ia32_addpd256' was not declared in this scope I have defined __AVX__ and __FMA__ macros to make AVX avilable, but apparently this isn't enough. There is no error if I use compiler…
Violet Giraffe
  • 32,368
  • 48
  • 194
  • 335
0
votes
2 answers

Avoiding unnecessary loads (SSE/AVX)

When compiled for x64, the following function uses the XMM0 register for parameter passing: void foo (double const scalar) { __m256d vector = _mm256_broadcast_sd(&scalar); } In assembly, the vbroadcastsd opcode can take a register operand. The…
linguamachina
  • 5,785
  • 1
  • 22
  • 22
0
votes
1 answer

C++ convert SSE code to AVX

With the help of YOU, I have used SSE in my code (sample below) with significant performance boost and I was wondering if this boost could be improved by using 256bit registers of AVX. int result[4] __attribute__((aligned(16))) = {0}; __m128i…
Alexandros
  • 2,160
  • 4
  • 27
  • 52
0
votes
1 answer

why do the SSE and AVX have same efficiency?

I use vs2012 and want to test the efficiency of SSE and AVX. The code for SSE and AVX is almost the same, except the SSE uses _m128 and AVX uses _m256. I expected the AVX code to be two times faster then the SSE code, But the test result shows…
myej
  • 65
  • 5
0
votes
1 answer

32B chunks, contiguous and non-contiguous memory accesses

I wrote a matrix-matrix(32bit floats) multiplication function in C++ using intrinsics for large matrices(8192x8192), minimum data size is 32B for every read and write operation. I will change the algorithm into a blocking one such that it reads a…
huseyin tugrul buyukisik
  • 11,469
  • 4
  • 45
  • 97
0
votes
1 answer

G++ Asm inline: register clobbering

Does gcc compiler use push/pop for register backup if I dont write anything in clobber list? What happens for input and output list registers? I will make a short asm inline that saves some general purpose registers to XMM/YMM registers then plays…
huseyin tugrul buyukisik
  • 11,469
  • 4
  • 45
  • 97
0
votes
1 answer

AVX and Bubble Sort

I have to develop a bubble sort algorithm with AVX instructions with single precision numbers in input. Can anyone help me to look for the best implementation? I did a bubble sort version for SSE3: global sort32 sort32: start mov eax, [ebp+8] …
Frank
  • 730
  • 2
  • 9
  • 20
0
votes
2 answers

FLT_EPSILON for a nth root finder with SSE/AVX

I'm trying to convert a function that finds the nth root in C for a double value from the following link http://rosettacode.org/wiki/Nth_root#C to find the nth root for 8 floats at once using AVX. Part of that code uses DBL_EPSILON * 10. However,…
user2088790
0
votes
0 answers

How to use Intel AVX on QNX Neutrino 6.5.0?

I recently started working with QNX 6.5.0 and can't understand how in QNX develop programs using Intel AVX. Installed QNX Development Studio 6.5.0 with GCC 4.4.2, I'm trying to write a simple program, but the build fails. #include int…
Ildar
  • 1
  • 2
0
votes
1 answer

Performing AVX integer operation

I'm trying to optimize some integer (_int64) operations using AVX. However, I can't even simple add operation. It keeps telling me illegal instruction. Pls can I be corrected on what i'm doing wrong? Thanks for (int i = 0; i < 1; i+=4) { __m256i…
FrancFine
  • 27
  • 3