4

I am trying to insert a transformed range of directory entries into a vector using its insert(const_iterator pos, InputIt first, InputIt last) member function template. Unfortunately I can't get the following code to compile under GCC 11.1.0, which should have ranges support according to https://en.cppreference.com/w/cpp/compiler_support.

#include <filesystem>
#include <vector>
#include <ranges>
#include <iterator>

namespace fs = std::filesystem;
namespace ranges = std::ranges;
namespace views = std::views;

// no solution
namespace std {
    template <typename F>
    struct iterator_traits<ranges::transform_view<ranges::ref_view<fs::directory_iterator>, F>> {
        using iterator_category = std::input_iterator_tag;
    };
}

int main() {
    std::vector<fs::path> directory_tree;

    auto subdir = fs::directory_iterator(".");
    ranges::input_range auto subdir_names = subdir
        | views::transform([](const auto& entry) { return entry.path(); /* can be more complex*/ })
        | views::common;
    
    // replacing subdir_names with subdir works
    std::input_iterator auto b = ranges::begin(subdir_names);
    std::input_iterator auto e = ranges::end(subdir_names);
    directory_tree.insert(
        directory_tree.begin(),
        b,
        e
    );
}

The error message mainly says:

error: no matching function for call to 'std::vector<std::filesystem::__cxx11::path>::insert(std::vector<std::filesystem::__cxx11::path>::iterator, std::ranges::transform_view<std::ranges::ref_view<std::filesystem::__cxx11::directory_iterator>, main()::<lambda(const auto:16&)> >::_Iterator<false>&, std::ranges::transform_view<std::ranges::ref_view<std::filesystem::__cxx11::directory_iterator>, main()::<lambda(const auto:16&)> >::_Iterator<false>&)'

and further down:

error: no type named ‘iterator_category’ in ‘struct std::iterator_traits<std::ranges::transform_view<std::ranges::ref_view<std::filesystem::__cxx11::directory_iterator>, main()::<lambda(const auto:15&)> >::_Iterator<false> >’

I have tried to add the above specialisation to std::iterator_traits for the concerning iterator type but to no avail. I want to understand why this does not compile and if possible, how it can be fixed. I want to avoid creating a temporary vector.

Let me know if more of gcc's error message is required.

neop
  • 89
  • 2
  • 9

2 Answers2

9

fs::directory_iterator is an input range. Which means that when you adapt it via transform you still get an input range (naturally). This transformed range's iterators have a postfix operator++ that returns void.

This was arguably a defect in the C++98 iterator model, which still required even input iterators to have a postfix operator++ that returned the original type back. Even if this was necessarily a dangling operation. In the C++20 iterator model, postfix-increment can return void for input iterators.

As such, the transformed range you get back (the views::common is no-op, it's already common) is a C++20 input range (as you're verifying) but it's not any kind of C++98/C++17 range because its iterators do not even satisfy Cpp17InputIterator due to that postfix-increment rule - and so its iterators don't even bother providing iterator_category.

That makes:

directory_tree.insert(directory_tree.begin(), b, e);

fail, since this function expects types that satisfy Cpp17InputIterator, and b and e do not.


The workaround is to instead do:

ranges::copy(subdir_names, std::inserter(directory_tree, directory_tree.begin()));

Or even combine the two steps:

ranges::copy(subdir
        | views::transform([](const auto& entry) { return entry.path(); /* can be more complex*/ }),
        std::inserter(directory_tree, directory_tree.begin())
);

Here, we only require the source range to be a C++20 input_range (which it is).

The intent is that soon you will be able to write:

directory_tree.insert_range(
        directory_tree.begin(),
        subdir
        | views::transform([](const auto& entry) { return entry.path(); }));

but that won't be until C++23.

Barry
  • 286,269
  • 29
  • 621
  • 977
  • Thank you so much for the explanation :) Do I understand correctly that `ranges::copy(subdir_names, std::inserter(directory_tree, directory_tree.begin()));` has complexity O(size(directory_tree) * distance(pos, end(directory_tree))) while `directory_tree.insert(directory_tree.begin(), b, e);` would be O(size(directory_tree) + distance(pos, end(directory_tree))), because `insert(iter, value)` just calls `vector::insert(iter, value)`? Under the assumption that directory_tree already has entries. – neop Aug 05 '21 at 22:45
  • @neop Why would they have different complexity? – Barry Aug 05 '21 at 23:04
  • According to cppreference `vector::insert(const_iterator pos, InputIt first, InputIt last)` is "Linear in std::distance(first, last) plus linear in the distance between pos and end of the container". `ranges::copy` is "Exactly (last - first) assignments" where assignments to `std::insert_iterator` call `vector::insert(const_iterator pos, const T& value)` which is "linear in the distance between pos and end of the container". At least that is my understanding. Range insert can probably make space for the whole (even input-) range while the copy moves every element for each inserted element. – neop Aug 05 '21 at 23:31
  • @neop Since you're dealing with input iterators, there's no way to do a single allocation internally, so it's the same. You probably want to be inserting at the end rather than the beginning (and if you really want them to end up at the front, that's a `rotate`). – Barry Aug 06 '21 at 00:00
  • If I understand you correctly I disagree. Pushing the entire range and then rotating to the correct position would have better complexity than copy(range, inserter), because the range is rotated only once and not by one on every insert. Btw. wow, rotate is everywere. But I agree the allocation size can't be known upfront. In my case I actually want to insert in the middle but I thought it was not relevant to the question. – neop Aug 06 '21 at 00:34
0

Just wanted to add an alternative workaround. Please refer to Barry's answer for the explanation of the issue. The following adapter can be used to adapt an iterator to the interface of vector::insert.

#include <filesystem>
#include <vector>
#include <ranges>
#include <iterator>

namespace fs = std::filesystem;
namespace ranges = std::ranges;
namespace views = std::views;

template <typename It>
struct insertable_iterator_adapter : It {
    using iterator_category = std::input_iterator_tag;
    using difference_type = std::ptrdiff_t;
    using value_type = std::decay_t<decltype(*std::declval<It>())>;
    using pointer = value_type*;
    using reference = value_type&;
};

int main() {
    std::vector<fs::path> directory_tree;

    auto subdir = fs::directory_iterator(".");
    ranges::input_range auto subdir_names = subdir
        | views::transform([](const auto& entry) { return entry.path(); });

    directory_tree.insert(
        directory_tree.begin(),
        insertable_iterator_adapter{ranges::begin(subdir_names)},
        insertable_iterator_adapter{ranges::end(subdir_names)}
    );
}

Although I'm not sure to how many other algorithms this solution applies.


Edit: I don't think using std::inserter as in

ranges::copy(subdir_names, std::inserter(directory_tree, directory_tree.begin()));

is a good idea when directory_tree already has values, since the complexity is O(n * k) with n = previous directory_tree.size() and k = subdir_names.size()

According to cppreference vector::insert(const_iterator pos, InputIt first, InputIt last) is "Linear in std::distance(first, last) plus linear in the distance between pos and end of the container". ranges::copy is "Exactly (last - first) assignments" where assignments to std::insert_iterator call vector::insert(const_iterator pos, const T& value) which is "linear in the distance between pos and end of the container" This is because when inserting a single element at a time the vector needs to shift everything afterwards by one everytime.

On the other hand vector::insert(const_iterator pos, InputIt first, InputIt last) is "Linear in std::distance(first, last) plus linear in the distance between pos and end of the container" or O(k + n).

Another alternative with the same O(k + n) complexity is

ranges::copy(subdir_names, std::back_inserter(directory_tree));
std::rotate(
  ranges::begin(directory_tree),
  ranges::begin(directory_tree) + previous_size,
  ranges::end(directory_tree)
);

If your vector was empty previously then there is no complexity difference between the three options, but then there is also no reason to insert instead of push_back.

Benchmark highlighting the complexity problem.

To me the option with the adapter is the clearest and also has good complexity. If you know how to write a more general (better) adapter for Cpp17InputIterator please let me know.

neop
  • 89
  • 2
  • 9