Questions tagged [auto-vectorization]

145 questions
314
votes
9 answers

What is "vectorization"?

Several times now, I've encountered this term in matlab, fortran ... some other ... but I've never found an explanation what does it mean, and what it does? So I'm asking here, what is vectorization, and what does it mean for example, that "a loop…
Thomas Geritzma
  • 6,337
  • 6
  • 25
  • 19
21
votes
2 answers

How to vectorize with gcc?

The v4 series of the gcc compiler can automatically vectorize loops using the SIMD processor on some modern CPUs, such as the AMD Athlon or Intel Pentium/Core chips. How is this done?
19
votes
3 answers

Why doesn't GCC show vectorization information?

I'm using Codeblocks for a C program on Windows 7. The program is using the OMP library. GCC version is 4.9.2. Mingw x86_64-w64-mingw32-gcc-4.9.2.exe. Flags used are: -fopenmp -O3 -mfpmath=sse -funroll-loops -ftree-loop-distribution -ftree-vectorize…
Franktrt
  • 373
  • 1
  • 8
  • 18
16
votes
1 answer

Does compiler use SSE instructions for a regular C code?

I see people using -msse -msse2 -mfpmath=sse flags by default hoping that this will improve performance. I know that SSE gets engaged when special vector types are used in the C code. But do these flags make any difference for regular C code? Does…
Jennifer M.
  • 1,398
  • 1
  • 9
  • 11
16
votes
1 answer

Under what conditions does the .NET JIT compiler perform automatic vectorization?

Does the new RyuJIT compiler ever generate vector (SIMD) CPU instructions, and when? Side note: The System.Numerics namespace contains types that allow explicit use of Vector operations which may or may not generate SIMD instructions depending on…
redcalx
  • 8,177
  • 4
  • 56
  • 105
14
votes
3 answers

How can I disable vectorization while using GCC?

I am compiling my code using following command: gcc -O3 -ftree-vectorizer-verbose=6 -msse4.1 -ffast-math With this all the optimizations are enabled. But I want to disable vectorization while keeping the other optimizations.
PhantomM
  • 825
  • 6
  • 17
  • 34
14
votes
1 answer

Why do compilers miss vectorization here?

Consider the following valarray-like class: #include struct va { void add1(const va& other); void add2(const va& other); size_t* data; size_t size; }; void va::add1(const va& other) { for (size_t i = 0; i < size;…
14
votes
2 answers

Simple getter/accessor prevents vectorization - gcc bug?

Consider this minimal implementation of a fixed vector: constexpr std::size_t capacity = 1000; struct vec { int values[capacity]; std::size_t _size = 0; std::size_t size() const noexcept { return _size; } …
Vittorio Romeo
  • 90,666
  • 33
  • 258
  • 416
13
votes
1 answer

cython boundscheck=True faster than boundscheck=False

Consider the following minimal example: #cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True cimport cython from libc.stdlib cimport malloc def main(size_t ni, size_t nt, size_t nx): cdef: …
antony
  • 2,877
  • 4
  • 31
  • 43
13
votes
1 answer

Why does vectorization behave differently for almost the same code?

Here are free functions that do the same but in the first case the loop is not vectorized but in the other cases it is. Why is that? #include typedef std::vector Vec; void update(Vec& a, const Vec& b, double gamma) { const…
Dmitry
  • 146
  • 3
11
votes
0 answers

Is integer vectorization accuracy / precision of integer division CPU-dependent?

I tried to vectorize the premultiplication of 64-bit colors of 16-bit integer ARGB channels. I quickly realized that due to lack of accelerated integer division support I need to convert my values to float and use some SSE2/SSE4.1 intrinsics…
György Kőszeg
  • 17,093
  • 6
  • 37
  • 65
11
votes
1 answer

-ftree-vectorize option in GNU

With the GCC compiler, the -ftree-vectorize option turns on auto-vectorization, and this flag is automatically set when using -O3. To what level does it vectorize? I.e., will I get SSE2, SSE4.2, AVX, or AVX2 instructions? I know of the existence of…
R_Kapp
  • 2,818
  • 1
  • 18
  • 32
10
votes
1 answer

std::min vs ternary gcc auto vectorization with #pragma GCC optimize ("O3")

I know that "why is my compiler doing this" aren't the best type of questions, but this one is really bizarre to me and I'm thoroughly confused. I had thought that std::min() was the same as the handwritten ternary (with maybe some compile time…
Maltysen
  • 1,868
  • 17
  • 17
10
votes
1 answer

Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?

I have this piece of code which segfaults when run on Ubuntu 14.04 on an AMD64 compatible CPU: #include #include #include int main() { uint32_t sum = 0; uint8_t *buffer = mmap(NULL, 1<<18, PROT_READ, …
kasperd
  • 1,952
  • 1
  • 20
  • 31
10
votes
3 answers

Why gcc autovectorization does not work on convolution matrix biger than 3x3?

I've implemented the following program for convolution matrix #include #include #define NUM_LOOP 1000 #define N 128 //input or output dimention 1 #define M N //input or output dimention 2 #define P 5 //convolution matrix…
Amiri
  • 2,417
  • 1
  • 15
  • 42
1
2 3
9 10