Questions tagged [auto-vectorization]
145 questions
314
votes
9 answers
What is "vectorization"?
Several times now, I've encountered this term in matlab, fortran ... some other ... but I've never found an explanation what does it mean, and what it does? So I'm asking here, what is vectorization, and what does it mean for example, that "a loop…

Thomas Geritzma
- 6,337
- 6
- 25
- 19
21
votes
2 answers
How to vectorize with gcc?
The v4 series of the gcc compiler can automatically vectorize loops using the SIMD processor on some modern CPUs, such as the AMD Athlon or Intel Pentium/Core chips. How is this done?

casualcoder
- 4,770
- 6
- 29
- 35
19
votes
3 answers
Why doesn't GCC show vectorization information?
I'm using Codeblocks for a C program on Windows 7. The program is using the OMP library. GCC version is 4.9.2. Mingw x86_64-w64-mingw32-gcc-4.9.2.exe.
Flags used are: -fopenmp -O3 -mfpmath=sse -funroll-loops -ftree-loop-distribution -ftree-vectorize…

Franktrt
- 373
- 1
- 8
- 18
16
votes
1 answer
Does compiler use SSE instructions for a regular C code?
I see people using -msse -msse2 -mfpmath=sse flags by default hoping that this will improve performance. I know that SSE gets engaged when special vector types are used in the C code. But do these flags make any difference for regular C code? Does…

Jennifer M.
- 1,398
- 1
- 9
- 11
16
votes
1 answer
Under what conditions does the .NET JIT compiler perform automatic vectorization?
Does the new RyuJIT compiler ever generate vector (SIMD) CPU instructions, and when?
Side note: The System.Numerics namespace contains types that allow explicit use of Vector operations which may or may not generate SIMD instructions depending on…

redcalx
- 8,177
- 4
- 56
- 105
14
votes
3 answers
How can I disable vectorization while using GCC?
I am compiling my code using following command:
gcc -O3 -ftree-vectorizer-verbose=6 -msse4.1 -ffast-math
With this all the optimizations are enabled.
But I want to disable vectorization while keeping the other optimizations.

PhantomM
- 825
- 6
- 17
- 34
14
votes
1 answer
Why do compilers miss vectorization here?
Consider the following valarray-like class:
#include
struct va
{
void add1(const va& other);
void add2(const va& other);
size_t* data;
size_t size;
};
void va::add1(const va& other) {
for (size_t i = 0; i < size;…

Alex Guteniev
- 12,039
- 2
- 34
- 79
14
votes
2 answers
Simple getter/accessor prevents vectorization - gcc bug?
Consider this minimal implementation of a fixed vector:
constexpr std::size_t capacity = 1000;
struct vec
{
int values[capacity];
std::size_t _size = 0;
std::size_t size() const noexcept
{
return _size;
}
…

Vittorio Romeo
- 90,666
- 33
- 258
- 416
13
votes
1 answer
cython boundscheck=True faster than boundscheck=False
Consider the following minimal example:
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True
cimport cython
from libc.stdlib cimport malloc
def main(size_t ni, size_t nt, size_t nx):
cdef:
…

antony
- 2,877
- 4
- 31
- 43
13
votes
1 answer
Why does vectorization behave differently for almost the same code?
Here are free functions that do the same but in the first case the loop is not vectorized but in the other cases it is. Why is that?
#include
typedef std::vector Vec;
void update(Vec& a, const Vec& b, double gamma) {
const…

Dmitry
- 146
- 3
11
votes
0 answers
Is integer vectorization accuracy / precision of integer division CPU-dependent?
I tried to vectorize the premultiplication of 64-bit colors of 16-bit integer ARGB channels.
I quickly realized that due to lack of accelerated integer division support I need to convert my values to float and use some SSE2/SSE4.1 intrinsics…

György Kőszeg
- 17,093
- 6
- 37
- 65
11
votes
1 answer
-ftree-vectorize option in GNU
With the GCC compiler, the -ftree-vectorize option turns on auto-vectorization, and this flag is automatically set when using -O3. To what level does it vectorize? I.e., will I get SSE2, SSE4.2, AVX, or AVX2 instructions? I know of the existence of…

R_Kapp
- 2,818
- 1
- 18
- 32
10
votes
1 answer
std::min vs ternary gcc auto vectorization with #pragma GCC optimize ("O3")
I know that "why is my compiler doing this" aren't the best type of questions, but this one is really bizarre to me and I'm thoroughly confused.
I had thought that std::min() was the same as the handwritten ternary (with maybe some compile time…

Maltysen
- 1,868
- 17
- 17
10
votes
1 answer
Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?
I have this piece of code which segfaults when run on Ubuntu 14.04 on an AMD64 compatible CPU:
#include
#include
#include
int main()
{
uint32_t sum = 0;
uint8_t *buffer = mmap(NULL, 1<<18, PROT_READ,
…

kasperd
- 1,952
- 1
- 20
- 31
10
votes
3 answers
Why gcc autovectorization does not work on convolution matrix biger than 3x3?
I've implemented the following program for convolution matrix
#include
#include
#define NUM_LOOP 1000
#define N 128 //input or output dimention 1
#define M N //input or output dimention 2
#define P 5 //convolution matrix…

Amiri
- 2,417
- 1
- 15
- 42