43

The STL algorithms are a pretty useful thing in C++. But one thing that kind of irks me is that they seem to lack composability.

For example, let's say I have a vector<pair<int, int>> and want to transform that to a vector<int> containing only the second member of the pair. That's simple enough:

std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;

std::transform(values.begin(), values.end(), std::back_inserter(result),
    [] (std::pair<int, int> p) { return p.second; });

Or maybe I want to filter the vector for only those pairs whose first member is even. Also pretty simple:

std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> result;

std::copy_if(values.begin(), values.end(), std::back_inserter(result),
    [] (std::pair<int, int> p) { return (p.first % 2) == 0; });

But what if I want to do both? There is no transform_if algorithm, and using both transform and copy_if seems to require allocating a temporary vector to hold the intermediate result:

std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> temp;
std::vector<int> result;

std::copy_if(values.begin(), values.end(), std::back_inserter(temp),
    [] (std::pair<int, int> p) { return (p.first % 2) == 0; });

std::transform(values.begin(), values.end(), std::back_inserter(result),
    [] (std::pair<int, int> p) { return p.second; });

This seems rather wasteful to me. The only way I can think of to avoid the temporary vector is to abandon transform and copy_if and simply use for_each (or a regular for loop, whichever suits your fancy):

std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;

std::for_each(values.begin(), values.end(),
    [&result] (std::pair<int, int> p) 
        { if( (p.first % 2) == 0 ) result.push_back(p.second); });

Am I missing something here? Is there a good way to compose two existing STL algorithms into a new one without needing temporary storage?

Sven
  • 21,903
  • 4
  • 56
  • 63
  • 2
    You could always wrap this with your own `transform_if` implementation. Some STL algorithm's implementations are as simple as a couple loops, anyway, so you wouldn't have any hit for it not being built-in. – Merlyn Morgan-Graham Jul 19 '11 at 06:34
  • 3
    I'd rather do `for (const auto& value: values) { if ((value.first % 2) == 0) result.push_back(value.second); }` – UncleBens Jul 19 '11 at 06:35
  • 2
    Code like this makes me wish there would be a way to embed Haskell into C++ (so this code would just become `map snd . filter (even . fst)`). – Frerich Raabe Jul 19 '11 at 07:06

5 Answers5

25

You're right. You can use Boost.Range adaptors to achieve composition.

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
  • 1
    Well, it's good to know there's at least a composable alternative to STL agorithms, even if they themselves cannot be composed. :) It's still too bad that the STL by itself doesn't seem to provide any good way to do this. – Sven Jul 19 '11 at 06:41
  • 1
    +1 - very nice, I didn't know about this! Every time I'm made aware of gems like this in Boost, I tell myself "Okay, one more feature like that and I'll actually go ahead and start using boost". And then I chicken out because I've got to support a dozen strange compilers (including MSVC6, gcc2.95 and strange Sun machines). – Frerich Raabe Jul 19 '11 at 07:11
  • 1
    @Frerich Raabe: The older boost versions did support those compilers, and did so quite well. And of course by moving compiler details into boost, you don't have to care that much. – MSalters Jul 19 '11 at 07:55
  • 3
    +1 Boost has tons of really useful "advanced" generic iterator and algorithm titbits like this. I wouldn't call them "alternatives to the STL", but rather "continuations of the STL philosophy". I really hope those make it into TR2! (Another cool one is the zip iterator that turns a tuple of vectors into a vector of tuples.) – Kerrek SB Jul 19 '11 at 09:57
11

I think the problem is unfortunately structural

  1. C++ uses two iterators to represent a sequence
  2. C++ functions are single-valued

so you cannot chain them because a function cannot return "a sequence".

An option would have been to use single-object sequences instead (like the range approach from boost). This way you could have combined the result of one processing as the input of another... (one object -> one object).

In the standard C++ library instead the processing is (two objects -> one object) and it's clear that this cannot be chained without naming the temporary object.

6502
  • 112,025
  • 15
  • 165
  • 265
8

Back in 2000, the problem was already noted. Gary Powell and Martin Weiser came up with a "view" concept, and coined the name "View Template Library". It didn't take off then but the idea makes sense. A "view" adaptor essentially applies an on-the-fly transform. For instance, it can adapt the value_type.

The concept probably should be readdressed now we have C++0x. We've made quite some progress in generic programming since 2000.

For example, let's use the vector<pair<int, int>> to vector<int> example. That could be quite simple:

std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, [](std::pair<int, int> p) { return p.first }); 
std::vector<int> result(view.begin(), view.end());

Or, using the boost::bind techniques, even simpler:

std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, &std::pair<int, int>::first); 
std::vector<int> result(view.begin(), view.end());
MSalters
  • 173,980
  • 10
  • 155
  • 350
3

Since C++20 you can use std::ranges::copy together with the range adaptors std::views::filter and std::views::values from the Ranges library as follows:

int main() {
    std::vector<std::pair<int, int>> values = { {1,2}, {4,5}, {6,7}, {9,10} };
    std::vector<int> result;

    auto even = [](const auto& p) { return (p.first % 2) == 0; };
    std::ranges::copy(values | std::views::filter(even) | std::views::values,
                      std::back_inserter(result));

    for (int i : result)
        std::cout << i << std::endl;

    return 0;
}

Output:

5
7

In the solution above, no temporary vector is created for an intermediate result, because the view adaptors create ranges that don't contain elements. These ranges are just views over the input vector, but with a customized iteration behavior.

Code on Wandbox

honk
  • 9,137
  • 11
  • 75
  • 83
0

Not sure if this is still active, but... A new light wait header only lib that does what you describe. Doc talks about lazy evaluation and com compossible generators.

Doc snippet:

  • Read in up to 10 integers from a file "test.txt".
  • filter for the even numbers, square them and sum their values.
    int total = lz::read<int>(ifstream("test.txt")) | lz::limit(10) |
                lz::filter([](int i) { return i % 2 == 0; }) |
                lz::map([](int i) { return i *  i; }) | lz::sum();

you can split that line up into multiple expressions.

    auto numbers = lz::read<int>(ifstream("test.txt")) | lz::limit(10);
    auto evenFilter = numbers | lz::filter([](int i) { return i % 2 == 0; });
    auto squares = evenFilter | lz::map([](int i) { return i *  i; });
    int total = squares | lz::sum();
  • Even though this expression is split over multiple variable assignments, it is not any less efficient.
  • Each intermediate variable simply describes a unit of code to be executed. All held in stack.

https://github.com/SaadAttieh/lazyCode

sa74
  • 1