41

Suppose that I have a vector of something:

std::vector<Foo> v;

This vector is sorted, so equal elements are next to each other.

What is the best way to get all iterator pairs representing ranges with equal elements (using the standard library)?

while (v-is-not-processed) {
    iterator b = <begin-of-next-range-of-equal-elements>;
    iterator e = <end-of-next-range-of-equal-elements>;

    for (iterator i=b; i!=e; ++i) {
        // Do something with i
    }
}

I'd like to know how to get values of b and e in the code above.

So, for example, if v contains these numbers:

 index 0 1 2 3 4 5 6 7 8 9
 value 2 2 2 4 6 6 7 7 7 8

Then I'd like to have b and e point to elements in the loop:

 iteration  b  e
 1st        0  3
 2nd        3  4
 3rd        4  6
 4th        6  9
 5th        9 10

Is there an elegant way to solve this with the standard library?

JeJo
  • 30,635
  • 6
  • 49
  • 88
geza
  • 28,403
  • 6
  • 61
  • 135
  • 5
    It should be noted the code in the question is not a good example of how this could be useful. Nothing is done with `e` other than bounding the inner loop, and the inner loop could just as well be bounded by testing for a new value in element `i`. So any effort expended finding `e` serves no purpose (unless computation of the “value” used to sort `v` is so excessively expensive that a binary search for `e` would be cheaper than testing each element as we go). – Eric Postpischil Jul 02 '19 at 21:09
  • @EricPostpischil: That's true. But even if we don't use `e` for anything, this formulation is convenient, it's harder to make an error. The other way (to check for changing values) is more tedious (as we need to handle the last range specially - or do you know some tricks to avoid it?). – geza Jul 02 '19 at 21:22
  • @geza No special treatment would be required for the last range. – Walter Jul 10 '19 at 08:39

6 Answers6

29

This is basically Range v3's group_by: group_by(v, std::equal_to{}). It doesn't exist in the C++17 standard library, but we can write our own rough equivalent:

template <typename FwdIter, typename BinaryPred, typename ForEach>
void for_each_equal_range(FwdIter first, FwdIter last, BinaryPred is_equal, ForEach f) {
    while (first != last) {
        auto next_unequal = std::find_if_not(std::next(first), last,
            [&] (auto const& element) { return is_equal(*first, element); });

        f(first, next_unequal);
        first = next_unequal;
    }
}

Usage:

for_each_equal_range(v.begin(), v.end(), std::equal_to{}, [&] (auto first, auto last) {
    for (; first != last; ++first) {
        // Do something with each element.
    }
});
Justin
  • 24,288
  • 12
  • 92
  • 142
  • 1
    If you know that these subranges of equal elements are large, you may benefit by probing through some sort of binary search. – Justin Jul 02 '19 at 20:52
  • I like this solution the most, as its usage is the clearest one. – geza Jul 03 '19 at 06:44
26

You can use std::upper_bound to get the iterator to the "next" value. Since std::upper_bound returns an iterator to the first element greater than that value provided, if you provide the value of the current element, it will give you an iterator that will be one past the end of the current value. That would give you a loop like

iterator it = v.begin();
while (it != v.end()) {
    iterator b = it;
    iterator e = std::upper_bound(it, v.end(), *it);

    for (iterator i=b; i!=e; ++i) {
        // do something with i
    }
    it = e; // need this so the loop starts on the next value
}
Justin
  • 24,288
  • 12
  • 92
  • 142
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • 1
    The only problem is that `std::upper_bound` does a little bit more, because it has to find the element with binary search. But in my case, this is unneeded. – geza Jul 02 '19 at 20:55
  • 2
    This makes sense if the subranges of equal elements are large. If they are small, you waste effort doing a binary search over the entire range, when a linear search could find the next element faster (and with better cache locality). – Justin Jul 02 '19 at 20:55
  • 4
    @geza If you want to do a linear traversal instead then you can replace `std::upper_bound(it, v.end(), *it);` with `std::find_if(it, v.end(), [=](auto e) { return *it != e; });`. Depending on the data this could definitely be faster. – NathanOliver Jul 02 '19 at 20:59
  • +1, thanks, but I accepted Justin's solution, as it is a little bit more clear at the usage (I mean, the algorithm has a name, so it is easier to understand, what the code does - but of course, your variation can be modified this way as well, your solution basically the same). – geza Jul 03 '19 at 06:47
21

You are looking for std::equal_range.

Returns a range containing all elements equivalent to value in the range [first, last).

Something like the following should work.

auto it = v.begin();
while (it != v.end())
{
    auto [b, e] = std::equal_range(it, v.end(), *it);
    for (; b != e; ++b) { /* do something in the range[b, e) */ }
    it = e;             // need for the beginning of next std::equal_range
}

Remark: Even though this will be an intuitive approach, the std::equal_range obtains its first and second iterators(i.e b and e) with the help of std::lower_bound and std::upper_bound, which makes this approche slightly inefficient. Since, the first iterator could be easily accessible for the OP's case, calling std::upper_bound for second iterator only neccesarry(as shown by @NathanOliver 's answer).

JeJo
  • 30,635
  • 6
  • 49
  • 88
  • 3
    This does some extra work to find the lower-bound of the range when we know that it's just `it`, but at that point, we'd be the same as NathanOliver's answer (`std::upper_bound` instead of `std::equal_range`). – Justin Jul 02 '19 at 21:09
  • 2
    @Justin Agreed. Maybe a slight advantage: *less typing* due to structured binding possibility. – JeJo Jul 02 '19 at 21:12
  • +1, but I've accepted Justin's solution, as while this is the shortest version (and easy to understand without adding a name), has the little problem of unnecessary work done by `std::equal_range`. – geza Jul 03 '19 at 06:48
9

If your ranges of equal values is short, then std::adjacent_find would work well:

for (auto it = v.begin(); it != v.end();) {
    auto next = std::adjacent_find(it, v.end(), std::not_equal_to<Foo>());
    for(; it != next; ++it) {

    }
}

You can also substitute a lambda for std::not_equal_to if you wish.

Kyle
  • 6,500
  • 2
  • 31
  • 41
7

But even if we don't use e for anything, this formulation is convenient, it's harder to make an error. The other way (to check for changing values) is more tedious (as we need to handle the last range specially [...])

Depends on how you interpret 'handling last range specially':

auto begin = v.begin();
// we might need some initialization for whatever on *begin...
for(Iterator i = begin + 1; ; ++i)
{
    if(i == v.end() || *i != *begin)
    {
        // handle range single element of range [begin, ???);
        if(i == v.end())
            break;
        begin = i;
        // re-initialize next range
    }
}

No special handling for last range – solely, possibly needing the initialization code twice...

Nested-loop-approach:

auto begin = v.begin();
for(;;)
{
    // initialize first/next range using *begin
    for(Iterator i = begin + 1; ; ++i)
    {
        if(i == v.end() || *i != *begin)
        {
            // handle range single element of range [begin, ???);
            if(i == v.end())
                goto LOOP_EXIT;
            begin = i;
            break;
        }
    }
}
LOOP_EXIT:
// go on
// if nothing left to do in function, we might prefer returning over going to...

More elegant? Admitted, I'm in doubt myself... Both approaches avoid iterating over the same range twice (first for finding the end, then the actual iteration), though. And if we make our own library function from:

template <typename Iterator, typename RangeInitializer, typename ElementHandler>
void iterateOverEqualRanges
(
    Iterator begin, Iterator end,
    RangeInitializer ri, ElementHandler eh
)
{
    // the one of the two approaches you like better
    // or your own variation of...
}

we could then use it like:

std::vector<...> v;
iterateOverEqualRanges
(
    v.begin(), v.end(),
    [] (auto begin) { /* ... */ },
    [] (auto current) { /* ... */ }
);

Now finally, it looks similiar to e. g. std::for_each, doesn't it?

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • Thanks for the solution, I like that it doesn't need double iteration on the elements. By "handling last range specially" I meant that we need to check it somehow. Even by doing the `i == v.end()` twice. – geza Jul 03 '19 at 06:52
0
for(auto b=v.begin(), i=b, e=v.end(); i!=e; b=i) {
    // initialise the 'Do something' code for another range
    for(; i!=e && *i==*b; ++i) {
        // Do something with i
    }
}
Walter
  • 44,150
  • 20
  • 113
  • 196