I've recently been doing some performance evaluation of ranges & views. I've posted a simple example (also at https://www.godbolt.org/z/7ThxjKafc) where the difference in assembly is much more significant than I would have expected. With latest GCC & -O3,
- the assembly for
sum_array
contains 31 instructions and 8 jumps. - the assembly for
sum_vec
contains 12 instructions and 2 jumps.
Given that the size of m_array
is known at compile time, I would have expected near identical assembly for both functions. Should I expect the optimizing compiler to improve in future versions, or is there some fundamental limitation in how std::views::join
is specified?
#include <array>
#include <vector>
#include <ranges>
struct Foo {
auto join() const { return m_array | std::views::join; }
auto direct() const { return std::views::all(m_array[0]); }
std::array<std::vector<int*>, 1> m_array;
};
__attribute__((noinline)) int sum_array(const Foo& foo)
{
int result = 0;
for (int* val : foo.join())
result += *val;
return result;
}
__attribute__((noinline)) int sum_vec(const Foo& foo)
{
int result = 0;
for (int* val : foo.direct())
result += *val;
return result;
}