Simple getter/accessor prevents vectorization - gcc bug?

Question

Consider this minimal implementation of a fixed vector<int>:

constexpr std::size_t capacity = 1000;

struct vec 
{
    int values[capacity];
    std::size_t _size = 0;    

    std::size_t size() const noexcept 
    { 
        return _size; 
    }

    void push(int x) 
    {
        values[size()] = x;
        ++_size;
    }
};

Given the following test case:

vec v;
for(std::size_t i{0}; i != capacity; ++i) 
{
    v.push(i);
}

asm volatile("" : : "g"(&v) : "memory");

The compiler produces non-vectorized assembly: live example on godbolt.org

If I make any of the following changes...

values[size()] -> values[_size]
Add __attribute__((always_inline)) to size()

...then the compiler then produces vectorized assembly: live example on godbolt.org

Is this a gcc bug? Or is there a reason why a simple accessor such as size() would prevent auto-vectorization unless always_inline is explicitly added?

The compiler can tell implicitly that the returned value is going to change would be my guess. Modern compilers are pretty good at figuring out what is going to change, and in this case the return value of `size` is guaranteed to change in all cases. — Mgetz, Feb 13 '18 at 13:50
Additionally adding `__attribute__((const))` to `size()` results in auto-vectorization being applied (`__attribute__((pure))` does not). — Thomas Russell, Feb 13 '18 at 13:53
Putting `++_size` inside a class member results in vectorization: https://godbolt.org/g/toBQc7 Also gcc version 5.x and 6.x produce vectorized code: https://godbolt.org/g/wU6n8F — Andriy Berestovskyy, Feb 14 '18 at 10:58
This is a missed optimization, and a regression in gcc-7 compared to earlier versions of the compiler --> please report it to gcc's bugzilla. — Marc Glisse, Apr 27 '18 at 12:40
@MarcGlisse: reported as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84362 — Vittorio Romeo, Apr 27 '18 at 12:45
@VittorioRomeo thanks, sorry we all seemed to miss your report. — Marc Glisse, Apr 27 '18 at 14:20

score 1 · Accepted Answer · answered May 02 '18 at 14:54

The loop in your example is vectorised for GCC < 7.1, and not vectorized for GCC >= 7.1. So there seems to be some change in behaviour here.

We can look at the compiler optimisation report by adding -fopt-info-vec-all to the command line:

For GCC 7.3:

<source>:24:29: note: === vect_pattern_recog ===
<source>:24:29: note: === vect_analyze_data_ref_accesses ===
<source>:24:29: note: not vectorized: complicated access pattern.
<source>:24:29: note: bad data access.
<source>:21:5: note: vectorized 0 loops in function.

For GCC 6.3:

<source>:24:29: note: === vect_pattern_recog ===
<source>:24:29: note: === vect_analyze_data_ref_accesses ===
<source>:24:29: note: === vect_mark_stmts_to_be_vectorized ===
[...]
<source>:24:29: note: LOOP VECTORIZED
<source>:21:5: note: vectorized 1 loops in function.

So GCC 7.x decides not to vectorise the loop, because of a complicated access pattern, which might be the (at that point) non-inlined size() function. Forcing inlining, or doing it manually fixes that. GCC 6.x seems to do that by itself. However, the assembly does look like size() was eventually inlined in both cases, but maybe only after the vectorisation step in GCC 7.x (this is me guessing).

I wondered why you put the asm volatile(...) line at the end - probably to prevent the compiler from throwing away the whole loop, because it has no observable effect in this test case. If we just return the last element of v instead, we can reach the same without causing any possible side-effects on the memory model for v.

return v.values[capacity - 1];

The code now vectorises with GCC 7.x, as it already did with GCC 6.x:

<source>:24:29: note: === vect_pattern_recog ===
<source>:24:29: note: === vect_analyze_data_ref_accesses ===
<source>:24:29: note: === vect_mark_stmts_to_be_vectorized ===
[...]
<source>:24:29: note: LOOP VECTORIZED
<source>:21:5: note: vectorized 1 loops in function.

So what's the conclusion here?

something changed with GCC 7.1
best guess: a side-effect of the asm volatile messes with inlining of size() preventing vectorisation

Whether or not this is a bug - could be either in 6.x or 7.x depending on what behaviour is desired for the asm volatile() construct - would be a question for the GCC developers.

Also: try adding -mavx2 or -mavx512f -mavx512cd (or -march=native etc.) to the command line, depending on your hardware, to get vectorisation beyond 128-bit xmm, i.e. ymm and zmm, registers.

Thanks for the great analysis! I did not know about `-fopt-info-vec-all`. It is probably worthwhile to add a link to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84362 in the answer. — Vittorio Romeo, May 03 '18 at 10:50

score 0 · Answer 2 · answered Apr 25 '18 at 13:49

I could narrow the problem down.

In double or single precision and the optimization flags -std=c++11 -Ofast -march=native:

Clang with Version >= 5.0.0 produces AVX move instructions with zmm registers

Gcc with 4.9 <= Version <= 6.3 produces AVX move instructions with zmm registers

Gcc with Version >= 7.1.0 produces AVX move instructions with xmm registers

Try it out: https://godbolt.org/g/NXgF4g

Simple getter/accessor prevents vectorization - gcc bug?

2 Answers2