4

Every now and then I need to iterate over a subset of the elements of a container or just want to extract them and neglect the rest. I end up using boost::range::adaptors::filtered to create this lazy collections.

for(auto&& i : container | filtered(predicate)) {
  // .. do stuff
}

Is there a reason for the lack of a collect algorithm (as in Ruby's collect) in the STL (we only have copy_if which is not the same)? Or any reason against using it?

A possible implementation could be:

template<class Container, class Predicate>
Container collect(Container&& c, Predicate&& p) {
  Container n;
  for(auto&& i : c) {
    if(p(i)) {
      n.push_back(i);
    }
  }
  return n;
}

but a lazy_collect might also be useful to avoid copying.

All answers below are great. I wish I could mark all of them. I didn't knew about std::back_inserter. Collecting stuff is now as easy as:

boost::copy( orig | filtered(predicate), std::back_inserter(collection));
gnzlbg
  • 7,135
  • 5
  • 53
  • 106
  • Are you perhaps looking for `std::copy_if()` defined in ``? – Angew is no longer proud of SO Mar 21 '13 at 10:12
  • @Angew seems that you posted just as I added that `copy_if` is not the same. When using `copy_if` you need to know before hand how much space you will need, resulting in `n container(count_if(old,predicate)); std::copy_if(old,n,predicate);`. – gnzlbg Mar 21 '13 at 10:16
  • 1
    Just as my asnwer shows, you can use `back_inserter` to bypass the apriori size knowledge requirement. – Angew is no longer proud of SO Mar 21 '13 at 10:20
  • Whats so bad in using boost range adaptors? they seem like a flexible solution here. Just like with unix cmd line tools it is often much more useful to have several small tools you can stack together than a big collections for functions that implement all combinations. – PlasmaHH Mar 21 '13 at 10:21
  • 2
    @PlasmaHH: Surely the essence of the question here is, "what is so bad about boost range adaptors that they didn't make it into the standard?" :-) – Steve Jessop Mar 21 '13 at 10:22
  • 1
    @SteveJessop: If there is a proposal, then its documented why not. If there is not, then thats the reason ;) – PlasmaHH Mar 21 '13 at 10:23
  • @PlasmaHH: Right, the perfect answer might be if someone knowledgeable about Boost activities can give an insight as to why Boost folks proposed other Boost components and not this one. It might for example be that the standard is far more conservative about funky operator overloads than Boost, but that's pure speculation. – Steve Jessop Mar 21 '13 at 10:26
  • 1
    @SteveJessop: indeed. however it sometimes is just that there is not enough time. Someone should pay an iso team fulltime... – PlasmaHH Mar 21 '13 at 10:27

3 Answers3

6

Standard algorithms don't operate directly on containers (creating or destroying them, or changing their sizes). They operate on iterator ranges, and the only changes the algorithm make are via assignment through iterators.

So, your proposed operation is entirely outside the scope of the standard algorithms. Maybe there "should" be a large extra set of generic container-operations in the standard, including this one, but the "STL philosophy" is to operate on iterators.

The non-lazy operation you propose can be done using std::back_inserter and std::copy_if (optionally using move iterators) or move_if (if you roll that yourself). std::copy_if was missing from C++03, but that was an accidental oversight.

The lazy version can't be done just by plugging together standard components -- there's no clean way to chain the "output" of one algorithm straight into the "input" of another algorithm without intermediate storage. That's why Boost provides such an interesting variety of iterators, as well as Ranges.

I don't know why they weren't incorporated into C++11, but one thing to note about C++11 was that it was rather late. Proposals were dropped for lack of time, so the reason might be as simple as "it was never considered important enough to propose, given the known existing workload".

Regarding your particular implementation, not all containers have push_back, and so you probably shouldn't use Container as the template parameter name. PushBackSequence would be more descriptive of the requirements.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • 2
    Also note, if a container has `insert` rather than `push_back`, you can use [`inserter`](http://en.cppreference.com/w/cpp/iterator/inserter) instead of `back_inserter`. – Peter Wood Mar 21 '13 at 10:40
  • Careful using `std::copy_if` with move iterators, as you could have valid but unspecified state objects scattered around in your container. Better to roll your own `move_if` and have it move all the valid but unspecified objects to the end and return the iterator to where that point begins allowing you to assign or erase those items like is done with `std::remove_if`. – Adrian Apr 01 '19 at 19:26
5

How about this:

#include <algorithm>
#include <iterator>

std::copy_if(container.begin(), container.end(), std::back_inserter(result), []{...})

Where container is the container you want to collect from, and result is the container which will store the collection.

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
5

From the comments:

When using copy_if you need to know before hand how much space you will need

That's not true. You can use std::copy_if with a back inserter:

#include <algorithm> // For std::copy_if
#include <iterator> // For std::back_inserter
#include <vector>
#include <iostream>

int main()
{
    std::vector<int> v(10);
    std::iota(begin(v), end(v), 0); // Fills the vector with 0..9

    std::vector<int> out;
    std::copy_if(begin(v), end(v), back_inserter(out), [] (int x) 
    //                             ^^^^^^^^^^^^^^^^^^
    {
        return x > 4; 
    });

    for (auto x : out) { std::cout << x << " "; }
}

Here is a live example.

And if you want a function that works directly on a container rather than on a range defined by a pair of iterators, you could write the following helper:

template<typename C1, typename C2, typename P>
void cont_copy_if(C1&& src, C2&& dst, P&& p)
{
    std::copy_if(
        begin(std::forward<C1>(src)),
        end(std::forward<C1>(src)),
        back_inserter(std::forward<C2>(dst)),
        std::forward<P>(p)
        );
}

Which you would just use this way:

int main()
{
    std::vector<int> v(10);
    std::iota(begin(v), end(v), 0);

    std::list<int> out;
//  ^^^^^^^^^
//  You could use a different destination container

    cont_copy_if(v, out, [] (int x) { return x > 4; });
//  ^^^^^^^^^^^^

    for (auto x : out) { std::cout << x << " "; }
}

And of course the live example.

Andy Prowl
  • 124,023
  • 23
  • 387
  • 451
  • Indeed, thanks! I didn't knew about back_inserter. Now I can just do `boost::copy( old | filtered(pred), std::back_inserter(new))`. – gnzlbg Mar 21 '13 at 10:23
  • Be careful calling std::forward twice on the same variable (src). If src is bound to an r-value, then you're effectively moving src into begin(), and thus it's dangerous to pass it into end(). – bstamour Mar 21 '13 at 15:46
  • @bstamour: Why would I be moving? `begin()` simply invokes the `begin()` member function on its argument if it exists – Andy Prowl Mar 21 '13 at 15:49
  • In that case, why are you using std::forward at all if you don't ever want to move it? – bstamour Mar 21 '13 at 15:51
  • @bstamour: Indeed in this case it makes no difference, but I like to highlight the fact that I am forwarding arguments just as I received them: what was an lvalue stays an lvalue, what was an rvalue stays an rvalue. – Andy Prowl Mar 21 '13 at 15:53
  • Would't that be dangerous though? Consider void foo(myobject&& x); bar(T&& y) { foo(std::forward(y)); foo(std::forward(y)); } After the first call you've moved the data out of y and into the parameter of foo. Therefore the second call to foo won't do what you expect it to do. That's the point I was trying to raise. – bstamour Mar 21 '13 at 15:55
  • err. make foo(myobject x); instead. – bstamour Mar 21 '13 at 15:57
  • @bstamour: No, after the first call you haven't moved anything. All `std::forward` does is to change its argument into an lvalue reference to `A` (`A&`) if `T` is `A&`, and into an rvalue reference to `A` (`A&&`) if `T` is `A`. This value is then bound to the (lvalue or rvalue) reference `x`, and this is in turn used. If it was used this way: `foo(myobject&& x) { myobject y = x; }`, than that would generate a move, as you said. That would be dangerous. But this: `foo(myobject&& x) { x.bar(); }` won't generate any move per se, so it's safe. But it's true that using `forward` is not needed. – Andy Prowl Mar 21 '13 at 15:59
  • I don't buy it. Maybe I'm misunderstanding (quite possible.) If foo receives an r-value (y after std::forwarded) then y has been moved into x. If this isn't the case then perfect forwarding really has no meaning, as the data perfectly forwarded has a copy somewhere (still residing inside y.) EDIT: you may have missed my correction. Assume foo has the following signature: void foo(myobject x); – bstamour Mar 21 '13 at 16:06
  • @bstamour: In *your* example, `x` and `y` are just *references*. There's no move involved in binding references. Constructing or assigning a variable from an rvalue reference is what causes the move construction or move assignment (instead of copy construction or copy assignment). – Andy Prowl Mar 21 '13 at 16:09
  • See my edit. Assume foo has the signature void foo(myobject x); In that case it would be unwise to forward something into foo twice from the same variable without reassigning that variable first. i.e. if begin() and end() took their parameters by-value, then your code would cause odd behaviour. They accept by reference so it's okay however, but therefore the forwards are meaningless anyways, so it's moot. My argument is that in general it's unwise to forward something twice without re-assigning it in-between, as you don't know if it will be moved down the road. – bstamour Mar 21 '13 at 16:13
  • @bstamour: Well, of course if `foo()` takes by *value*, then it is dangerous, yes. But that's not the case, because `begin()` and `end()` do not (it would be non-sense if they did). I agree that if I were to call functions I do not know (e.g because they are provided as arguments), I wouldn't forward twice. In this case I think it is arbitrary, and I admit it is not needed. – Andy Prowl Mar 21 '13 at 16:18
  • Alrighty, we're on the same page finally. That's effectively all I wanted to point out: in general it's not wise to forward twice in a row, unless you're certain that there will be no move construction within the function you're forwarding into. – bstamour Mar 21 '13 at 16:22