6

I encountered unexpected behavior when mixing the output of boost::join with std::views::transform (shown below). The compiler issues no warnings whatsoever. Fortunately, an address sanitizer detects an invalid memory access in get2b().

The get1b() function follows the same pattern as get2b(), but that code works correctly. Given the possibility of UB, how can I be sure that a constructed range is legal? My paranoid side wants to write get1b() as return std::move(rng) | ....

https://www.godbolt.org/z/Y77YW3jYb

#include <array>
#include <ranges>
#include <algorithm>
#include <iostream>
#include <iterator>
#include "boost/range/join.hpp"
#include "boost/range/adaptor/transformed.hpp"

inline auto square = [](int x) { return x*x; };

struct A {
    std::array<std::vector<int>, 3> m_data{ std::vector{1, 2}, std::vector{3, 4, 5}, std::vector{6} };
    auto join1() const { return m_data | std::views::join; }
    auto join2() const { return boost::join(m_data[0], boost::join(m_data[1], m_data[2])); }
    auto get1a() const { return join1() | std::views::transform(square); }
    auto get1b() const { auto rng = join1(); return rng | std::views::transform(square); }
#if __GNUC__ >= 12
    auto get2a() const { return join2() | std::views::transform(square); }
#endif
    auto get2b() const { auto rng = join2(); return rng | std::views::transform(square); }
    auto get2c() const { auto rng = join2(); return rng | boost::adaptors::transformed(square); }
};

template<std::ranges::range R>
void print(R&& r)
{
    std::ranges::copy(r, std::ostream_iterator<int>(std::cout, " "));
    std::cout << '\n';
}

int main()
{
    print(A{}.get1a());
    print(A{}.get1b());
#if __GNUC__ >= 12
    print(A{}.get2a());
#endif
    // print(A{}.get2b()); <-- undefined behavior
    print(A{}.get2c());
    return 0;
}

P.S. The only reason why I even bothered to do this is because some of my team members were affected by Intellisense to not work properly on the IDE, which is ultimately caused by https://github.com/llvm/llvm-project/issues/44178. (sigh)

MarkB
  • 672
  • 2
  • 9
  • "*My paranoid side wants to write get1b() as return std::move(rng) | ....*" Yep, for `get2b()`, you need to do this. – 康桓瑋 Oct 25 '22 at 01:28
  • @康桓瑋, that just makes it like `get1a()`. One results in `std::ranges::owning_view` and the other results in `std::ranges::ref_view`. I understand why `owning_view` works correctly. However, the original question still remains. – MarkB Oct 25 '22 at 12:26

1 Answers1

2

Mixing and matching Boost.Ranges and C++20 Ranges/range-v3 is just asking for trouble - the models there are just fundamentally different.

In Boost.Ranges, everything is in the iterator. What r | adaptors::transformed(f) does is create a new object that is basically holding two iterators: a transformed iterator from begin(r) and a transformed iterator from end(r). It does this regardless of range type. Since all the information is in the iterator, iterators get arbitrarily large and expensive, but you don't have to worry about the intermediate ranges staying in scope. In C++20 parlance, r | adaptors::transformed(f) basically creates ranges::subrange(transform_iterator(ranges::begin(r), f), transform_iterator(ranges::end(r), f)).

So this works:

auto get2c() const { auto rng = join2(); return rng | boost::adaptors::transformed(square); }

even though the boost join adaptor goes out of scope, since the boost transformed adaptor takes the iterators from rng, and that's where all the information is anyway. Nothing in this result refers to rng in any way after construction.

But in range-v3/C++20 ranges, the model is quite different. We don't "capture" iterators, we capture ranges - and we have two kinds of ranges: a view (which are always captured by value) and a non-view range (which are captured by reference if they're lvalues and, in C++20, captured by owning_view if they're rvalues). What this means is that the expression:

lvalue | std::views::meow

might capture lvalue by reference, if lvalue isn't a view, so there might be a lifetime dependency here.

In this case:

auto get1b() const { auto rng = join1(); return rng | std::views::transform(square); }

our lvalue is a view, so it is copied into the resulting transform_view, so there's no dangling here. This is fine.

In this case:

auto get2a() const { return join2() | std::views::transform(square); }

our rvalue is always captured by value, so there's no dangling here either (rvalue views are just captured as-is, rvalue non-views are first wrapped in owning_view, but the result in terms of lifetime is the same).

The reason this case is problematic:

auto get2b() const { auto rng = join2(); return rng | std::views::transform(square); }

Is because rng is a boost adapted joined range, which doesn't advertise itself as a C++20 view. Thus, because it's an lvalue, it's captured by reference. Which then is destroyed at the } and thus we have a dangling reference. This can be fixed by std::move(rng) instead, which causes rng to be owned by the transform_view - we would no longer keep a reference to it.

Boost.Ranges should probably try to mark all of its adapted ranges as views - they definitely fit the semantic criteria, since they're all just iterator pairs. Had they done so, then the above would've copied rng instead of referring to it, and would've worked fine.

But until it does so, it's really just best to not use Boost.Ranges. There are some adaptors there that doesn't exist in C++20 or C++23, but until then you can use range-v3, or simply re-implement them. It's better to stay within one model for ranges.

Barry
  • 286,269
  • 29
  • 621
  • 977