8

As I understand it, the range-v3 library's view operations (requires C++17 currently, but to become an official part of the STL in C++20) provides chainable STL-like algorithms that are lazily evaluated. As an experiment, I created the following code to evaluate the first 4 perfect numbers:

#include <iostream>
#include <range/v3/all.hpp>

using namespace std;
int main(int argc, char *argv[]) {
    auto perfects = ranges::view::ints(1)
                    | ranges::view::filter([] (int x) {
                        int psum = 0;
                        for (int y = 1; y < x; ++y) {
                            if (x % y == 0) psum += y;
                        }
                        return x == psum;})
                    | ranges::view::take(3);
    std::cout << "PERFECT NUMBERS:" << std::endl;
    for (int z : perfects) {
        std::cout << z << std::endl;
    } 
    std::cout << "DONE." << std::endl;
}

The code starts with a possible infinite range of numbers (ranges::view::ints(1)), but because the view algorithm ends with ranges::view::take(3) it should halt after finding the first three numbers passing the filter algorithm (a brute-force algorithm to filter out perfect numbers, intentionally not that efficient). Since the first three perfect numbers --- 6, 28, and 496 --- are fairly small, I expect this code to quickly find these, print "DONE." and terminate. And that's exactly what happens:

coliru -- taking 3 perfect numbers works just fine

However, let's say I want to print the first 4 perfect numbers, which are still all fairly small --- 6, 28, 496, and 8128. After printing 8128 the program does not halt and eventually has to be terminated; presumably it is vainly trying to compute the fifth perfect number, 33550336, which is beyond the ability of this brute-force algorithm to efficiently find.

coliru -- taking 4 perfect numbers tries to take 5+

This seems inconsistent to me. I would have understood if both tests had failed (concluding that I had misunderstood the lazy evaluation of range-v3's view algorithms), but the fact that take(3) succeeds and halts while take(4) does not seems like a bug to me, unless I'm misunderstanding things.

I've tried this with several compilers on wandbox and it seems to be persistent (tried clang 6.0.1 and 7.0.0, g++ 8.1.0 and 8.2.0). At least on my local computer, where I found the issue originally, version 0.3.6 of range-v3 is being used, but I'm not sure about coliru and wandbox.

wandbox link

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
jwimberley
  • 1,696
  • 11
  • 24
  • Just one comment to help people avoid going down fruitless lines of inquiry: if you add "std::cout << "Checking " << x << std::endl" into the lambda function body, you'll see the code print stuff like "Checking 46838", so clearly the lambda function is being evaluated past the 4th perfect number (and the code's not hanging for some other reason) – jwimberley Nov 16 '18 at 17:32
  • Another hint: finding numbers that *aren't* perfect also does something odd. Keeping the cout statement, changing the result of the lambda to !(x==psum), and taking the first 8124 non-perfect numbers (the last of which is 8127, to make up for not including 6, 28, or 426), the lambda function is still evaluated for 8128 and 8129, before the for loop halts. My guess it that the view needs an end iterator, and so it keeps on evaluating the lambda function until it returns "true" to get an end iterator. – jwimberley Nov 16 '18 at 17:39
  • Easy way to confirm the above, with a much simpler lambda function that returns "true" for x <= n, "false" for n < x < n+m, and "true" for x >= m: [wandbox link](https://wandbox.org/permlink/Ugc1IDPkbAQgrAgi). The lambda function is clearly called until it returns true even after taking the first n successes. The question now is *why* it does this. It's not what I would have expected; it could lead to unexpected behavior (e.g. if lambda function body isn't pure and alters captured objects] and it could have big performance implications. – jwimberley Nov 16 '18 at 17:53
  • I also see some comments in range-v3's documentation that filter is deprecated; but just want to confirm that remove_if behaves the same way. – jwimberley Nov 16 '18 at 18:12
  • Please, post all required information on your question, do not link to external resources. – Sakura Kinomoto Dec 05 '18 at 22:59
  • @SakuraKinomoto I disagree here. Linking to online compilers like coliru, godbolt, or wandbox is common practice for C++ questions on SO. I would agree that only showing code on one of these sites and not including it in the question would not be good practice (because people reading the question couldn't get a good sense of the problem from reading the question), but I haven't done that --- the code included in the question is nearly identical to the coliru/wandbox code, save for trivial and explained modifications. – jwimberley Dec 05 '18 at 23:35

2 Answers2

9

A take view that contains n elements has n + 1 valid iterator values: n that correspond to elements in the range, and the n + 1st past-the-end iterator. It is intended that iterating over the take view necessarily forms each of those n + 1 iterators - indeed, it's useful to extract the underlying iterator value adapted by the take view's end iterator to perform additional computations.

take_view doesn't know that the range it's adapting is a filter, or that your filter predicate is inordinately expensive - it simply assumes that your predicate is O(1) as is necessary for it to provide O(1) iterator operations. (Although we did forget to make that complexity requirement explicit in C++20.) This case is a very good example of why we have complexity requirements: if the iterators of the range being adapted don't meet the Standard's O(1) complexity requirements, the view can't meet its complexity guarantees and reasoning about performance becomes impossible.

Casey
  • 41,449
  • 7
  • 95
  • 125
  • 1
    Ah, thank you for the explanation! It does make sense for the end iterator to generally allow access to an underlying iterator, and that this requires the observed behavior. I'm happy that my abuse of the feature can be made an example for clarifying complexity requirements :-) – jwimberley Nov 16 '18 at 20:43
  • @jwimberley You're welcome. And thank *you* for making me realize that we forgot to impose the necessary complexity requirement on the functions used with `filter_view` and `transform_view`. – Casey Nov 17 '18 at 18:16
  • You're welcome too! More generally, mightn't the complexity requirement for the developer to ensure that for a predicate P, `while (!P(*iter)) { ++iter; }` is O(1)? A horrific mis-use like a O(1) predicate `x <= 4` would have essentially the same behavior as the predicate `perfect(x)` when take-ing 4 or more from integers `[1,infinity)`. More realistically there might be a perfectly sensible looking predicate, but if the entries passing the predicate aren't randomly distributed in the container, incrementing the iterator the next entry passing the predicate might not be O(1). – jwimberley Nov 17 '18 at 18:35
5

Apology:

I'm (partly) answering my own question because I think I've learned what is going on here, mechanically, and because the extra detail won't fit into a comment. I'm not sure of the etiquette, so if this would be better as an edit to the question --- there's still the open question of why the library is designed this way --- please suggest that in the comments I'll happily move it there.

Filtering until finding an end iterator

I don't understand the internals of range-v3 in great detail so I might not have terminology exactly right. In short, there is no inconsistent behavior here. When a call to ranges::view::take follows a call to ranges::view::filter (or ranges::view::remove_if), the resulting view object must set an end iterator at some point during the iteration to break out of the for-loop. If I'd thought about it, I would have imagined that the ranged-based for loop still expands to something like

for (auto it = std::begin(perfects); it != std::end(perfects); ++it) {
    ...
}

(which, btw, behaves identically in my examples) and that after it has found the required number of elements, at the beginning of the subsequent operator++ call on it, there would hbe special logic to make the result equal to std::end(perfects), so that the loop exits without doing any additional work. But instead, and this makes some sense from an implementation standpoint, the end iterator actually corresponds to the next element returned by the filter/remove_if view. The filter predicate keeps on looping over ranges::view::ints(1) until it finds one for which the predicate returns true; presumably this becomes the end iterator, since it is not printed in the ranged for loop.

An easy demonstration of this is provided by the following code. Here, there are two configurable integers n and m, and the predicate function in filter returns true for x <= n, false for n < x < n+m, and true for x >= m:

#include <iostream>
#include <range/v3/all.hpp>

using namespace std;
int main(int,char**) {
    int n = 5;
    int m = 3;
    auto perfects = ranges::view::ints(1)
                    | ranges::view::filter([&n,&m] (int x) {
                        std::cout << "Checking " << x << "... ";
                        if (x <= n) {
                            return true;
                        } else if (x <= n + m) {
                            std::cout << std::endl;
                            return false;
                        }
                        return true;})
                    | ranges::view::take(n);
    std::cout << "First " << n << " numbers:" << std::endl;
    for (int z : perfects) {
        std::cout << " take it!" << std::endl;
    }
    std::cout << "DONE." << std::endl;
}

You can run this code for different values of n and m here: wandbox. The default output is as follows:

First 5 numbers:
Checking 1...  take it!
Checking 2...  take it!
Checking 3...  take it!
Checking 4...  take it!
Checking 5...  take it!
Checking 6... 
Checking 7... 
Checking 8... 
Checking 9... DONE.

(I didn't rename the variable perfects; clearly it is not a set of perfect numbers anymore). Even after taking the first n successes, the lambda predicate is called until it returns true. Since the integer that returns true, 9, is not printed, it must be the std::end(perfects) that breaks the ranged for-loop.

The remaining mystery to me is why it does this. It's not what I would have expected; it could lead to unexpected behavior (e.g. if lambda function body isn't pure and alters captured objects) and it could have big performance implications, as demonstrated by the original example, which would have to perform roughly 10^15 modulo operations before reaching the integer 33550336.

jwimberley
  • 1,696
  • 11
  • 24
  • 3
    Don't apologize for coming up with your own answer. I'm glad you did, and I'm sure others will be too. – Pezo Nov 16 '18 at 19:59