8

Assume I have vector of strings and I want concatenate them via std::accumulate.

If I use the following code:

std::vector<std::string> foo{"foo","bar"};
string res=""; 
res=std::accumulate(foo.begin(),foo.end(),res,
  [](string &rs,string &arg){ return rs+arg; });

I can be pretty sure there will be temporary object construction.

In this answer they say that the effect of std::accumulate is specified this way:

Computes its result by initializing the accumulator acc with the initial value init and then modifies it with acc = acc + *i or acc = binary_op(acc, *i) for every iterator i in the range [first,last) in order.

So I'm wondering what is the correct way to do this to avoid the unnecessary temporary object construction.

One idea was to change the lambda this way:

[](string &rs,string &arg){ rs+=arg; return rs; }

In this case, I thought I force efficient concatenation of the strings and help the compiler (I know I shouldn't) omit the unnecessary copy, since this should be equivalent to (pseudocode):

accum = [](& accum,& arg){ ...; return accum; }

and thus

accum = & accum;

Another idea was to use

accum = [](& accum,& arg){ ...; return std::move(accum); }

But this would probably lead to something like:

accum = std::move(& accum);

Which looks very suspicious to me.

What is the correct way to write this to minimize the risk of the unnecessary creation of temporary objects? I'm not just interested in std::string, I'd be happy to have a solution, that would probably work for any object that has copy and move constructors/assignments implemented.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
tach
  • 663
  • 12
  • 19
  • 1
    You should just make a function for concatenation... – user541686 Oct 29 '13 at 16:43
  • 1
    An ugly alternative is to use a pointer to a local `std::string` variable as the accumulator, and perhaps `reserve` beforehand. Though now, `accumulate` is reduced merely to a `for_each`, and isn't much better than David's solution below. – R. Martinho Fernandes Oct 29 '13 at 16:51
  • 2
    It appears that `std::accumulate` will *always* make temporary copies. If this is unacceptable then you need to use something else. – Mark Ransom Oct 29 '13 at 16:52
  • @MarkRansom `std::accumulate` doesn't make temporary copies; it is the `operator+` that it calls which makes the additional copy. (The `operator=` may also end up copying; with C++ and move semantics, it likely won't, but with earlier versions it would.) – James Kanze Oct 30 '13 at 09:25
  • C++20 specifies it as `acc = move(acc) + rhs`, which can considerably cheapen accumulation for types that aren't cheap to copy. For example, a good `std::string` implementation will have an `operator+(string&& lhs, T)` that hijacks the `lhs`, appends to that, & returns it (which is RVOable). @R.MartinhoFernandes A less ugly equivalent for that is to accumulate in a `reference_wrapper`, as in [this answer](https://stackoverflow.com/a/39494205/2757035), but yes, I do wonder if that is really much/any better than just a `for[_each]` with a captured reference. I guess maybe slightly, semantically? – underscore_d Sep 24 '18 at 16:29

4 Answers4

11

I would break this into two operations, first std::accumulate to obtain the total length of the string that needs to be created, then a std::for_each with a lambda that updates the local string:

std::string::size_type total = std::accumulate(foo.begin(), foo.end(), 0u, 
                [](std::string::size_type c, std::string const& s) {
                    return c+s.size() 
                });
std::string result;
result.reserve(total);
std::for_each(foo.begin(), foo.end(), 
              [&](std::string const& s) { result += s; });

The common alternative to this is using expression templates, but that does not fit in an answer. Basically you create a data structure that maps the operations, but does not execute them. When the expression is finally evaluated, it can gather the information it needs upfront and use that to reserve the space and do the copies. The code that uses the expression template is nicer, but more complicated.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • 1
    I'm trying to get the std::accumulate to work efficiently - that means avoid unnecessary creation of temporary objects. I don't mind reallocs. I'm able to avoid std::accumulate and force the efficient behavior other way, but that is not what I'm looking for. – tach Oct 29 '13 at 16:52
  • Nice idea! It probably won't get any more efficient than this and it's short enough code too. +1 – stefan Oct 29 '13 at 16:53
  • 4
    @tach: You can choose the behavior you want or the tool you use, but you cannot unscrew with a hammer. --Although that is not entirely true, if you are willing to put enough effort you can create infrastructure to do that (again, expression templates type of approach) – David Rodríguez - dribeas Oct 29 '13 at 17:02
  • I'm assuming that in some cases the compiler might be able to optimize out the temporary objects creation, or at least somehow mitigate it. I would love to use std::accumulate because of its syntax, but if it is always inefficient I would say it's usefulness is quite reduced. – tach Oct 29 '13 at 17:07
  • @tach: The problem is not one of optimization from the compilers point of view of what optimization is. You want it to change `operator+` calls and map them into `operator+=`, but the compiler does not know of that equivalency. Note that while `std::string` is part of the standard, it is implemented as a user defined type, the compiler most probably does not know as much as you do. – David Rodríguez - dribeas Oct 29 '13 at 17:09
  • I think you have the right idea but are wrong on a couple of details. I had to make some mods to get this to compile. Do you know why `accumulate` isn't summing the sizes? http://ideone.com/Kc1vf8 – Mark Ransom Oct 29 '13 at 18:50
  • @MarkRansom: I did not pay attention to the details. In your ideone link the problem is that `accumulate` returns a new value, it does not modify the argument. The result of accumulate is ignored (both in my original code and your code) – David Rodríguez - dribeas Oct 29 '13 at 18:54
  • Btw, if I was really asking how to make the concatenation efficient and not about the optimization of the std::accumulate, I would prefer this code: http://ideone.com/2kpX2q . It is semantically almost the same as your code, but it is less typing. (Even with the std::string::size_type it would be shorter). – tach Oct 29 '13 at 23:23
  • @tach: note that the accepted answer is not guaranteed to optimize the cost of the temporaries. An implementation of `std::string` might use the copy-and-swap idiom and in that case the assignment hidden inside the `std::accumulate` would require the creation of the extra string. VS does check for self assignment explicitly, and gcc uses (an incorrect) reference counting, so in neither platform the cost is present, but things might be different in different implementations. – David Rodríguez - dribeas Oct 30 '13 at 01:47
  • @DavidRodríguez-dribeas Regardless of the implementation: the return value of `operator+` is a distinct string from its arguments. In `a = a + b`, the `a + b` operator must create a new object, copying both `a` and `b` into this new object. – James Kanze Oct 30 '13 at 09:22
  • @DavidRodríguez-dribeas When you say that g++ uses an incorrect reference counting, are you referring to the thread safety issue which I discovered many years back, or something else. (In practice, although it's easy to show the thread safety issue, it requires such an exotic combination of actions that I doubt anyone has actually seen it in real code.) – James Kanze Oct 30 '13 at 09:27
  • @JamesKanze: following Dietmar Khül's pressure I now use C++ to refer to C++11 and C++03 for the older standard. In C++11 the implementation is not allowed to use reference counting. To be honest, they have fixed this, but you have to opt into the new behavior since that is a binary incompabible change. As far as I know in g++ 4.8 the unstandard reference counting behavior is maintained. – David Rodríguez - dribeas Oct 30 '13 at 12:37
  • @DavidRodríguez-dribeas I've wondered about this. Pre-C++11, the expressed intent _was_ to allow reference counting, since it is really the preferred implementation for a lot of uses. Is the fact that it is no longer legal intentional (which would represent a major step backwards) or simply an accidental side effect of trying to fix some broken wording? – James Kanze Oct 31 '13 at 09:16
  • @JamesKanze: It looks like completely intentional. Check [N2668](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2668.htm) – David Rodríguez - dribeas Oct 31 '13 at 15:03
  • @DavidRodríguez-dribeas Or at least they realized it, and considered the implications. (I rather liked CoW, but it is difficult to implement efficiently _and_ correctly in a multithreaded environment.) – James Kanze Oct 31 '13 at 19:42
  • regarding the template deduction of `std::acculumate`, I would replace the `0u` with `std::string::size_type(0)` or a static cast to that type. – JHBonarius Sep 09 '20 at 12:35
5

Using std::accumulate efficiently without any redundant copies is not obvious.
In addition to being reassigned and passed into and out-of the lambda, the accumulating value may get copied internally by the implementation.
Also, note that std::accumulate() itself takes the initial value by-value, calling a copy-ctor and thus, ignoring any reserve()s done on the source of the copy (as suggested in some of the other answers).

The most efficient way I found to concatenate the strings is as follows:

std::vector<std::string> str_vec{"foo","bar"};

// get reserve size:
auto sz = std::accumulate(str_vec.cbegin(), str_vec.cend(), std::string::size_type(0), [](int sz, auto const& str) { return sz + str.size() + 1; });

std::string res;
res.reserve(sz);
std::accumulate(str_vec.cbegin(), str_vec.cend(),
   std::ref(res), // use a ref wrapper to keep same object with capacity
   [](std::string& a, std::string const& b) -> std::string& // must specify return type because cannot return `std::reference_wrapper<std::string>`.
{                                                           // can't use `auto&` args for the same reason
   a += b;
   return a;
});

The result will be in res.
This implementation has no redundant copies, moves or reallocations.

Adi Shavit
  • 16,743
  • 5
  • 67
  • 137
  • @VaughnCato: Thanks :-). I actually investigated this same issue and figured this out a couple days before finding the question. – Adi Shavit Sep 18 '16 at 16:46
4

Try the following

res=std::accumulate(foo.begin(),foo.end(),res,
  [](string &rs, const string &arg) -> string & { return rs+=arg; });

Before this call maybe there is a sence to call

std::string::size_type n = std::accumulate( foo.begin(), foo.end(), 
   std::string::size_type( 0 ),
   [] ( std::string_size_type n, const std::string &s ) { return ( n += s.size() ); } );

res.reserve( n );
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • 1
    This will make copies into the accumulator used internally by `accumulate` (it does the equivalent of `accum = op(accum, *it);`. – R. Martinho Fernandes Oct 29 '13 at 16:50
  • 2
    There is no any coping. There is the copy assignment operator that will see that the string tries to assign to itself. – Vlad from Moscow Oct 29 '13 at 16:58
  • @VladfromMoscow: Are you sure? Does that happen only for this case, or does it work for the return rs; that I said in my question too? If I could rely on it, or at least be pretty sure, this would be sufficient for me. – tach Oct 29 '13 at 17:12
  • @tach if you make it `[](string &rs,string &arg) -> string& { ...` then, yes, it applies to your example as well. – R. Martinho Fernandes Oct 29 '13 at 17:14
  • It is true, I just checked it. Thanks a lot! – tach Oct 29 '13 at 17:19
  • lambda-free version: `std::string result; std::accumulate(foo.begin(), foo.end(), result);`. With C+11, the assignment inside std::accumulate (v = v + s) turns into a swap because of move semantics. That's not free but it doesn't involve copying characters. – rici Oct 29 '13 at 18:02
  • @VladfromMoscow That would be a very poor implementation of the assignment operator of `std::string`. Slow down the usual case (however little) to improve the performance of a rare case. – James Kanze Oct 29 '13 at 18:17
  • @rici It's still creating a new string object (with a copy) to return from the `+` operator. – James Kanze Oct 29 '13 at 18:30
  • 1
    @JamesKanze, an assignment operator needs to handle the case where you're assigning to yourself however rare it might be. However there are other techniques such as taking the parameter by copy instead of reference (letting the copy constructor do the work) and swapping the contents - it works but doesn't avoid the overhead as this answer implies. – Mark Ransom Oct 29 '13 at 18:33
  • @MarkRansom: in this case, it's move assignment, not copy assignment. It still needs to handle self-assignment, but it only needs to do a swap, not a copy-and-swap. – rici Oct 29 '13 at 19:08
  • 1
    @JamesKanze: Good point, although the reality is slightly different. It creates a new string with a copy as an argument of `+`, but the result is not copied; it is swapped. That's because `operator+(std::string,std::string)` is defined in terms of `std::string::operator+=` (something like: `std::string r = a; r += b; return r; `) – rici Oct 29 '13 at 19:14
  • @MarkRansom Of course an assignment operator should work if you assign to yourself. But you don't need to test for self assignment for this; in fact, if you need to test for self assignment, the operator is probably broken. – James Kanze Oct 30 '13 at 09:19
  • In case if the first parameter of the lambda expression is `string &`, that does not compile on C++20 – Dmytro Ovdiienko Feb 16 '22 at 13:18
  • Better to handle the first parameter as `auto&&` (perfect forwarding reference). On C++17 `auto&&` will be converted into `std::string&`, on C++20 - `std::string&&`. I'm not sure though if it is correct to upgrade `std::string&&` into `std::string&` in the return statement... – Dmytro Ovdiienko Feb 16 '22 at 13:56
  • Also according to [cppreference.com](https://en.cppreference.com/w/cpp/algorithm/accumulate) the signature of the `op` must be `Ret fun(const Type1 &a, const Type2 &b);`... Look like it is illegal to modify the input parameter. Thus, maybe better to pass the first parameter by value? And because of that do not use `std::accumulate` on strings at all? :) – Dmytro Ovdiienko Feb 16 '22 at 14:15
1

This is a bit tricky, since there are two operations involved, the addition and the assignment. In order to avoid the copies, you have to both modify the string in the addition, and ensure that the assignment is a no-op. It's the second part which is tricky.

What I've done on occasions is to create a custom "accumulator", along the lines of:

class Accu
{
    std::string myCollector;
    enum DummyToSuppressAsgn { dummy };
public:
    Accu( std::string const& startingValue = std::string() )
        : myCollector( startingValue )
    {
    }
    //  Default copy ctor and copy asgn are OK.
    //  On the other hand, we need the following special operators
    Accu& operator=( DummyToSuppressAsgn )
    {
        //  Don't do anything...
        return *this;
    }
    DummyToSuppressAsgn operator+( std::string const& other )
    {
        myCollector += other;
        return dummy;
    }
    //  And to get the final results...
    operator std::string() const
    {
        return myCollector;
    }
};

There'll be a few copies when calling accumulate, and of the return value, but during the actual accumulation, nothing. Just invoke:

std::string results = std::accumulate( foo.begin(), foo.end(), Accu() );

(If you're really concerned about performance, you can add a capacity argument to the constructor of Accu, so that it can do a reserve on the member string. If I did this, I'd probably hand write the copy constructor as well, to ensure that the string in the copied object had the required capacity.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329