3

In C++20's std::ranges, we can expect getting views::group_by1. This can be very handy, but I found a problem while playing with it. From Eric Niebler's manual we can read that it "In essence, views::group_by groups contiguous elements together with a binary predicate.". Let's inspect an example. I have an std::vector of some ints and I want to group it into two ranges - representing even and odd numbers. My initial approach was to simply do:

int main() {
    using namespace ranges;

    std::vector<int> ints = {3, 9, 12, 10, 7, 5, 1, 4, 8};

    for (auto rng : ints | views::group_by(
            [](auto lhs, auto rhs) {
                const bool leftEven = lhs % 2 == 0;
                const bool rightEven = rhs % 2 == 0;

                return (leftEven && rightEven) || (!leftEven && !rightEven);
            })) {
        std::cout << rng << '\n';
    }
}

But that can't work. Or, to put it in another way, it will work, but yield unexpected (for some, I imagine) results for anyone familiar with similar operations in other languages (or even APIs). The output of this program is:

[3,9]
[12,10]
[7,5,1]
[4,8]

Even and odd numbers are not all grouped - that's because they are not all contiguous. 3 and 9 are paired together, becauase they are both ood and contiguous. Similarily (except from being even) 12 and 10. But 7, 5 and 1 will create a separate group - they won't be grouped with 3 and 9 and that's not what I would either want or expect.

What we could of course do is to partition the ints vector to order the elements so the evens and odds form two groups. The problem is... there is no views::partition in ranges. That leaves me with two options, where neither of them particularly appeals to me:

1. stdranges::partition before viewing the vector:

Calling:

ranges::partition(ints, [](auto elem) { return elem % 2 == 0; });

just before our range-based for loop and we have our desired output:

[8,4,12,10]
[7,5,1,9,3]

I don't like it because it lacks composability - one of the ranges' key factor. I don't want to partition the vector either, to be honest. I want to print its elements in two groups - evens and odds.

2. Use actions::sort and sort the vector using even-odd comparator:

int main() {
    using namespace ranges;

    std::vector<int> ints = {3, 9, 12, 10, 7, 5, 1, 4, 8};

    auto evens_first = [](auto lhs, auto rhs) { return lhs % 2 == 0 && rhs % 2 != 0; };

    for (auto rng : (ints |= actions::sort(evens_first)) | views::group_by(
            [](auto lhs, auto rhs) {
                const bool leftEven = lhs % 2 == 0;
                const bool rightEven = rhs % 2 == 0;

                return (leftEven && rightEven) || (!leftEven && !rightEven);
            })) {
        std::cout << rng << '\n';
    }
}

Note that the parenthesis around the |= operator are required, since otherwise the compose operator (|) of ranges will be evaluated first and we'll end up with the above code printing sorted elements of the vector, completely ignoring grouping (???).

This approach is okaaaay, but still not great. I'd much prefer to either have a group_by that could, for example, take a value and return a key (Java's and C#'s approach of handling grouping) or anyhow take the whole range into account, or at least have actions::partition available.

Side note: I see the rationale behind views::grouping_by working just with contiguous elements. It's the most efficient way - no need to store anything, no need to go back or look further. It's okay and sometimes it's the best tool for the job. But I believe it creates confusion by being counterintuitive for people who have worked with similar APIs in the past.

And to finally repeat the question - is there any more concise way of doing what I want, based on the examples and desired approaches I proposed?


1 I can't find it on cppreference, but I think I saw a confirmation somewhere that it's in. Correct me please if I am mistaken.

Fureeish
  • 12,533
  • 4
  • 32
  • 62

4 Answers4

1

group_by in those other languages does not get a comparator for two elements, but a projection from the element to a tuple of relevant values, on which hashing and comparing works. Also, they are free to allocate additional memory to get the job done.

Here, you don't pay for what you don't use admittedly reduces your comfort slightly. You have to do that step explicitly if you need it, instead of the language forcing it on you even if it's just useless make-work.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
0

So you need a 'group by' operator in the SQL sense, right? Just like sort action, a 'group by' operator in the SQL sense is an offline operator, it could not emit the first output element without seeing the last input element. Its offline nature makes it uncomposable intrincally.

Currently cpp ranges supports that behavior via a sort action and a group-by view, it is one method of implementing the 'group by' operation in SQL sense.

Certainly you could develop another implmentation via an custom action just like sort (but differently), and compose that with views::group_by.

Eg. std::partition is one method, or you could develop another better hash-based split method to group the input range into a range of buckets, the elements in each bucket share the same hash value (determined by a lambda passed into the split method).

0

I currently have the same issue where i have a vector of objects. Each of them has a member which signifies which group i need them in. The for now best solution i found is:

  1. Sorting them by the member group id
  2. Seperating / Chunking them using ranges::view::chunk_by The function provided to chunk_by would then simply check wether the two objects it receives have different group ids

Note that this solution works for any number of subgroups, not just two like in the example above.

0

This is annoying. You really want some kind filter like view that takes invocable returning a enumeration and returns std::array<std::vector<T>>.

Or perhaps, have a you could transform to a variant, and have a ranges::to_vector like function that partitions a range of variants (i.e. returns tuple<vector<Ts>...>).

Tom Huntington
  • 2,260
  • 10
  • 20