2

I am trying to create a data structure for arrays of dynamic-sized arrays. Multiple choices are possible, the simplest one being std::vector<std::vector<T>>. However it is often not efficient and we would like to compress the data of all the inner vectors into one big vector, and have a vector of offsets to tell where each element begins.

Example:

  // encoding of : | 4.,5.,1. | 7.,8.,9.,2 |
  std::vector<double> v    = {4.,5.,1., 7.,8.,9.,2};
  std::vector<int> offsets = {0       , 3         , 7};

Let's encapsulate it ! Consider the following data structure:

(note: the code is neither complete, general or precise, at this point this is just to give an idea of what is going on):

  class vblock_vector {
    private:
      std::vector<double> v;
      std::vector<int> offsets;

    public:
      using iterator = vblock_iterator;
      auto begin() -> iterator {
        return {v.data(),offsets.data()};
      }
      auto end() -> iterator {
        return {v.data(),offsets.data()+offsets.size()};
      }
  };

An basic implementation of the iterator type is the following:

  struct vblock_iterator {
    private:
      double* ptr;
      int* offsets_ptr;

    public:
      using reference = span_ref<double>; // see notes (0) and (1)
      // using value_type = ???; // See below

      auto operator++() {
        ++offsets_ptr;
        return *this;
      }
      auto operator*() const {
        return span_ref<double,int>(ptr+offsets_ptr[0],ptr+offsets_ptr[1]);
      }
      auto operator<=>(const vblock_iterator&) const = default;

      // ... other iterator interface stuff that is trivial
  };

This iterator works with e.g. std::copy. (4)

Now let's say that I want to replace my old std::copy calls with std::ranges::copy. For that, vblock_iterator needs to satisfy the std::input_iterator concept. In order to do that, vblock_iterator needs to have an associated value_type (required by the intermediate std::indirectly_readable concept).

An obvious choice would be using value_type = std::vector<double>(2), but I surely don't want to give std::ranges::copy the freedom to use this type at its discretion in its implementation: it would be inefficient.

My question is the following : why does std::input_iterator<In> requires In to have a value_type? At least for copying it is not needed (the fact that I can use std::copy and that it does the right thing proves it). Of course, one can say : "define value_type to be anything, it won't be used by std::range::copy implementations anyway", but then why require it?

I am currently under the impression that value_type is mandatory for e.g. std::swappable, but not for std::input_iterator (nor even std::random_access_iterator dare I say). But the standard committee decided otherwise: what is the reason behind this choice? (3)

Notes:

(0) span_ref is just like a std::span with reference semantics (its operator= is "assign-through" and not "rebind to new array").

(1) In reality, the reference type needs to be a tad more complex to account for offsets, but it is not the subject here. Suffice to say, that it is possible to have an efficient reference type for this structure.

(2) And I think this is the only reasonable choice. At least a container is needed (vector, deque...). E.g. a std::span won't do because if we bother to save the value pointed to by the iterator, it is because we will modify the original memory, and std::span won't help us with that.

(3) In the presentation of the std::indirectly_readable concept (then called Readable), Eric Niebler goes into some detail of why we need value_type to be related in some form to reference to work well with proxy references, but I still don't see why we would would even need value_type for algorithms that don't need to swap elements (or store them somewhere). Yes, there is mathematically a value_type for vblock_iterator, but why require it if it is not meant to be used? (similarly, there is also mathematical operator+= for forward ranges : but since it is inefficient, it is simply not required).

(4) And other algorithms: std::move, std::find, std::find_if, std::any_of, std::partition_point, std::lower_bound, std::unique... So I think that there is something more fundamental going on than: "we are just lucky with std::copy".

Bérenger
  • 2,678
  • 2
  • 21
  • 42
  • 1
    The legacy input iterator concept also requires `value_type`, but in contrast to the ranges algorithms with constraints on the type, `std::copy` is not required to diagnose its absence. It will technically just be UB to use the type as iterator in `std::copy` since it violates its preconditions. – user17732522 Mar 15 '22 at 20:38
  • More Hyrum's law than UB – Bérenger Mar 15 '22 at 20:44
  • To be clear, `value_type` doesn't necessarily need to be a member of `vblock_iterator`. It can alternatively be defined in `std::iterator_traits`. If you don't specialise the trait, then `vblock_iterator::value_type` will be used. I don't know the answer to the question for sure, but I would guess that it's in order for `std::input_iterator` to be a drop-in replacement for Cpp17InputIterator. – eerorika Mar 15 '22 at 20:50
  • @eerorika Yes it should handle previous Cpp17InputIterator cases, but I am not sure Cpp17InputIterator handles proxy references to begin with (I know that this is not the case of Forward, and IIRC it is very constrained for Input). – Bérenger Mar 15 '22 at 21:12
  • It looks like what you need is [`std::mdspan`](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p0009r15.html), which should probably make it into C++23. – 康桓瑋 Mar 16 '22 at 01:54
  • 2
    Not just `std::input_iterator`, [even `LegacyInputIterator` still requires `value_type`](https://en.cppreference.com/w/cpp/named_req/InputIterator), so your `vblock_iterator` doesn't model input iterator even in C++17. – 康桓瑋 Mar 16 '22 at 01:59
  • @康桓瑋 Yes you are right regarding `LegacyInputIterator`, so I deleted its mention in the OP. Regarding `std::mdspan`, I don't think it will work here since all rows (and all columns) must have the same size. The structure I am describing here is analogous to a sparse matrix in CSR format – Bérenger Mar 16 '22 at 08:41
  • (2) is wrong, you can mutate through a `std::span`, you would need `std::common_reference_with` etc – Caleth Mar 16 '22 at 09:30
  • @Caleth No. Example: suppose you want to rotate the first element at the end. You save it into a `std::span`, then put the second element into the first one, ... until the end. Then you replace the last element from the one you saved. Except that the values inside the saved element are those of what the `std::span` is pointing to, that is, the old second element. The values of your first element are now forever lost. – Bérenger Mar 16 '22 at 09:46
  • The basic answer to (3) is we already have 6 iterator categories, which *mostly* form a hierarchy of capabilities. It's relatively tractable to know if a given iterator works with a given algorithm. If instead each algorithm asked for *only* what it needed, we'd have dozens of categories in a web of capabilities – Caleth Mar 16 '22 at 14:29
  • @Caleth Beware that with C++20, algorithms are not only relying on these iterator categories, they are using additional constrains. For instance, `std::ranges::sort` requires `std::sortable`, and `std::ranges::reverse` requires `std::permutable`. I am not advocating to add a new iterator category. I am wondering if requiring `value_type` could be moved from `std::input_iterator` to `std::permutable` (or somewhere else). – Bérenger Mar 16 '22 at 14:53
  • It might be as simple as the presence of a referenceable `value_type` being the distinguishing feature of *InputIterator* over *OutputIterator* – Caleth Mar 16 '22 at 15:10
  • @Caleth Well it is true that `std::output_iterator` (or LegacyOutputIterator for that matter) does not require `value_type`, it requires `std::indirectly_writable`, that a `std::input_iterator` does not require – Bérenger Mar 16 '22 at 19:19

1 Answers1

1

std::copy requires a LegacyInputIterator for its iterator types. It does not check this requirement. If you fail to provide a LegacyInputIterator, your program is ill-formed, no diagnostic required.

A LegacyInputIterator requires that std::iterator_traits<X>::value_type exists because it subsumes LegacyIterator.

So your program was ill-formed once you passed it to std::copy. The behavior of your ill-formed program is not determined by the C++ standard in any way; the compiler can legally provide you a program that emails your browser history to your great aunt Eustice and be standard compliant. Or it could do something that happens to align with what you think the program "should" do. Or it could fail to compile.

The std::ranges algorithms have slightly different requirements. These requirements are far more likely to be checked by concepts than the old style algorithms are, telling the user with a compile time error.

You are running into such a case.

To be even more clear, you cannot rely on the implementation of std code to enforce the standard.

These types are required partly to make it easier to talk about the types in question and what operations on them mean, semantically.

Beyond the simple requirements like std::iterator_traits<X>::value_type exist, there are semantic requirements on what *it does, what x = *it++ does, etc. Most of those requirements cannot be checked by the compiler (due to Rice's theorem, they cannot be checked in theory); but the algorithms in the std namespace rely on those semantic meanings being correct for any iterator passed in.

Because the compiler can assume the semantic meanings are correct, the algorithms can be cleaner, simpler and faster than if they had to check them. And it means that multiple different compiler vendors can write different std algorithm implementations, improving the algorithm over each other, and there is an objective standard to argue against.

For a LegacyInputIterator and types value_type and reference from std::iterator_traits<X>, we must have:

value_type v = *it;

is a valid expression, *it must return reference, and

*it++

must return a type convertible to value_type.

Not every algorithm need use every property of every iterator it requires that iterator to have. The goal here is to have semantically meaningful categories that do not demand too much in the way of overhead.

Requiring that an iterator over stuff actually have a type it is an iterator over is not a large overhead. And it makes talking about that the iterator is insanely easier.

You could refactor it and remove that concept, or cut the concept up into smaller pieces so that the value_type is only required in the narrow cases where it is required, but that would make the concepts harder to write about and harder to understand.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • Thanks for the detailed explanation, but it does not explain *why* it is this way or *if it should* be this way. My guess is that before C++20 iterators, proxy references were not supported. So the `reference` type had to be a `value_type&` (possibly const-qualified), hence having a `value_type` was pretty natural. But I don't think that requiring the presence of `value_type` to all non-swapping standard algorithms is justified if virtually all of them don't need it – Bérenger Mar 16 '22 at 20:11
  • @Bérenger I explained that talking about iterators is easier when those types exist. That is their only purpose. That purpose has huge value. And no, input iterator `reference` for a `LegacyInputIterator` need not be `value_type&`; that requirement shows up after `LegacyInputIterator`. `reference` must be convertible-to `value_type`. Those sentences would be very difficult to say without a type `value_type`. Could you design an entire iterator heirarchy without such aliases? Yes. These concepts are already hard, I wouldn't want to see what it looks like. – Yakk - Adam Nevraumont Mar 16 '22 at 20:19
  • Yes sorry for `LegacyInputIterator`, I simplified a bit too much. Once again, I am not denying the fact that these `value_type` types do not exist in a mathematical sense, just that they might be tricky to materialize in code... and they are not needed by this code. Why not just make it only a semantic requirement that there exists such a value type? Such as `operator+= ` of `ForwardIterator` exists semantically as `operator++` repeated N times, without being materialized anyway – Bérenger Mar 16 '22 at 20:26
  • Another example would be Semi-regular : same semantic requirements as Regular, but `operator==` is not defined because it is tricky in some situations – Bérenger Mar 16 '22 at 20:28
  • @Bérenger `+=` does not exist intentionally on non-random access iterators, because the standard attempts to make expensive code verbose. `std::advance(x, n)` is `x+=n` for random-access iterators and `++x` repeatedly for non-random access iterators; the standard could have had `+=` do `++` repeatedly, but *intentionally did not*, because then `+=` might be surprisingly expensive. – Yakk - Adam Nevraumont Mar 16 '22 at 20:30
  • Yes this is my point : if the conversion from a lightweight reference type to a heavyweight reference type is expansive, why not do the same of what is done with `+=`? It would guarantee that e.g. `std::copy` or `std::ranges::copy` does not use `value_type` while they don't need to – Bérenger Mar 16 '22 at 20:42
  • @Bérenger `value_type` is a type not an operation. Again, it exists to talk about the iterator and its properties. You may not like this answer, but that is the answer. Provide it or have your code be ill-formed. – Yakk - Adam Nevraumont Mar 16 '22 at 20:52
  • It is not only a type though : a `reference` should be convertible to a `value_type`, and this conversion is an operation. And it's this conversion that I want to avoid (i.e. I don't want to give the implementation the freedom to use it). By the way, any sane implementation of copy/move/find/unique has also better avoid it (let's say for copying a range of `std::array`) – Bérenger Mar 16 '22 at 21:00
  • And I fully agree : it is ill-formed, and I have to provide `value_type` to correct it (and BTW, this is what I am doing in my code right now). But this is not the point : the point is : why is it required? – Bérenger Mar 16 '22 at 21:03
  • 1
    "*the point is : why is it required?*" Because [some algorithms](https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/ranges_algo.h#L2927) that require input iterator need to use `value_type`? – 康桓瑋 Mar 18 '22 at 02:11
  • @康桓瑋 Well `std::min` is just a wrapper around `std::min_element` that dereferences the returned iterator into a `value_type`. So yes by its specification, it cannot work without a `value type`. But it works for the more fundamental `std::min_element`. We can also wonder why `std::min` returns a `value_type` instead of a reference : it might be inefficient and that should be the caller's choice. – Bérenger Mar 18 '22 at 12:27
  • Note: And the constrained range versions do the same – Bérenger Mar 18 '22 at 12:28
  • "*Well std::min is just a wrapper around std::min_element that dereferences the returned iterator into a value_type*" No, `min_element` requires `forward_iterator` because the iterator it returns must be still valid. – 康桓瑋 Mar 18 '22 at 13:34
  • "*We can also wonder why std::min returns a value_type instead of a reference*" Because returning a dangling reference is not safe. – 康桓瑋 Mar 18 '22 at 13:35
  • @康桓瑋 Good point. Seems like `std::min` only works for two arguments or an `std::initializer_list`, but `std::ranges::min` works with input ranges and indeed it makes sense to return by value because of the input iterator case – Bérenger Mar 19 '22 at 22:17