Is there a fundamental performance cost to using std::views::join?

Question

I've recently been doing some performance evaluation of ranges & views. I've posted a simple example (also at https://www.godbolt.org/z/7ThxjKafc) where the difference in assembly is much more significant than I would have expected. With latest GCC & -O3,

the assembly for sum_array contains 31 instructions and 8 jumps.
the assembly for sum_vec contains 12 instructions and 2 jumps.

Given that the size of m_array is known at compile time, I would have expected near identical assembly for both functions. Should I expect the optimizing compiler to improve in future versions, or is there some fundamental limitation in how std::views::join is specified?

#include <array>
#include <vector>
#include <ranges>

struct Foo {
    auto join() const { return m_array | std::views::join; }
    auto direct() const { return std::views::all(m_array[0]); }
    std::array<std::vector<int*>, 1> m_array;
};
__attribute__((noinline)) int sum_array(const Foo& foo)
{
    int result = 0;
    for (int* val : foo.join())
        result += *val;
    return result;
}
__attribute__((noinline)) int sum_vec(const Foo& foo)
{
    int result = 0;
    for (int* val : foo.direct())
        result += *val;
    return result;
}

Reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106677 — Marc Glisse, Aug 18 '22 at 19:35

Is there a fundamental performance cost to using std::views::join?

0 Answers0