std::accumulate C++20 version

Question

I'm trying to understand this code but I can't figure out why this version

for (; first != last; ++first) 
    init = std::move(init) + *first;

is faster than this

for (; first != last; ++first)
    init += *first;

I did take them from std::accumulate. The assembly code of the first version is longer than the second one. Even if the first version create an rvalue ref of init, it always create a temp value by adding *first and then assign it to init, that is the same process in second case where it create a temp value and then assign it to init. So, why using std::move is better than "append value" with the += operator?

EDIT

I was looking at the code of C++20 version of accumulate, and they say that before C++20 accumulate was this

template<class InputIt, class T>
T accumulate(InputIt first, InputIt last, T init)
{
    for (; first != last; ++first) {
        init = init + *first;
    }
    return init;
}

and after C++20 it become

template<class InputIt, class T>
constexpr // since C++20
T accumulate(InputIt first, InputIt last, T init)
{
    for (; first != last; ++first) {
        init = std::move(init) + *first; // std::move since C++20
    }
    return init;
}

I've just wanted to know, if by using std::move there was any real improvement or not.

EDIT2

Ok, here is my example code:

#include <utility>
#include <chrono>
#include <iostream>

using ck = std::chrono::high_resolution_clock;

std::string
test_no_move(std::string str) {

    std::string b = "t";
    int count = 0;

    while (++count < 100000)
        str = std::move(str) + b;   // Without std::move

    return str;
}

std::string
test_with_move(std::string str) {

    std::string b = "t";
    int count = 0;

    while (++count < 100000)        // With std::move
        str = str + b;

    return str;

}

int main()
{
    std::string result;
    auto start = ck::now();
    result = test_no_move("test");
    auto finish = ck::now();

    std::cout << "Test without std::move " << std::chrono::duration_cast<std::chrono::microseconds>(finish - start).count() << std::endl;

    start = ck::now();
    result = test_with_move("test");
    finish = ck::now();

    std::cout << "Test with std::move " << std::chrono::duration_cast<std::chrono::microseconds>(finish - start).count() << std::endl;

    return 0;
}

If you run it you notice that the std::move version is really faster than the other one, but if you try it using built-in types you get the std::move version slower than the other one.

So my question was, since this situation is probably the same of std::accumulate, why do they say the C++20 accumulate version with std::move is faster than the version without it? Why using std::move with something like strings I get an improvement like that, but not using something like int? Why all of this, if in both of cases, the program create a temporary string str + b (or std::move(str) + b) and then move to str? I mean, it is the same operation. Why is the second faster?

Thanks for patience. Hope I made myself clear this time.

it depends of the type of `init`, if that type does not implements any overloading with rvalue reference, this cose will be the same, so please provide a complete example — Alberto Sinigaglia, Jun 16 '20 at 10:24
The second version has never been in the C++ standard. `std::accumulate` always operates either using `operator+()`, or using the `BinaryOperation` template parameter. — Ruslan, Jun 16 '20 at 10:25
`std::accumulate` is a template, so there are several steps needed before you can look at assembly. Can you include a [mcve]? — 463035818_is_not_an_ai, Jun 16 '20 at 10:40
@idclev463035818 no, I misspoke, I didn't mean that I tried to see assembly of a template, sorry, I wrote something like 2 int variables and see the difference in assembly by doing a = a + b and a = std::move(a) + b. — Sam, Jun 16 '20 at 11:35
Post your full benchmark code and how you compile and run it. — Maxim Egorushkin, Jun 16 '20 at 11:38
then please include a [mcve]. How do you call the methods? What compiler and what compiler options did you use? — 463035818_is_not_an_ai, Jun 16 '20 at 11:41
For built-in types there shouldn't be any difference in generated assembly. — Maxim Egorushkin, Jun 16 '20 at 11:56

Evg · Answer 1 · 2020-06-16T14:10:19.333

It is potentially faster for types with non-trivial move semantics. Consider accumulation of std::vector<std::string> of long enough strings:

std::vector<std::string> strings(100, std::string(100, ' '));

std::string init;
init.reserve(10000);
auto r = accumulate(strings.begin(), strings.end(), std::move(init));

For accumulate without std::move,

std::string operator+(const std::string&, const std::string&);

will be used. At each iteration it will allocate storage on heap for the resulting string just to throw it away at the next iteration.

For accumulate with std::move,

std::string operator+(std::string&&, const std::string&);

will be used. In contrast to the previous case, the buffer of the first argument can be reused. If the initial string has enough capacity, no additional memory will be allocated during accumulation.

Simple demo

without std::move
n_allocs = 199

with std::move
n_allocs = 0

For built-in types like int, move is just a copy - there is nothing to move. For an optimized build, most likely you'll get exactly the same assembly code. If your benchmarking shows any speed improvement/degradation, most likely you're not doing it correctly (no optimization, noise, code optimized out, etc.).

std::accumulate C++20 version

1 Answers1