Copy overhead when returning (big) objects?

Question

Consider the following two implementations of a simple Matrix4x4 Identity method.

1: This one takes a Matrix4x4 reference as parameter, in which the data is directly written.

static void CreateIdentity(Matrix4x4& outMatrix) {
    for (int i = 0; i < 4; ++i) {
        for (int j = 0; j < 4; ++j) {
            outMatrix[i][j] = i == j ? 1 : 0;
        }
    }
}

2: This one returns a Matrix4x4 without taking any input.

static Matrix4x4 CreateIdentity() {
    Matrix4x4 outMatrix;
    for (int i = 0; i < 4; ++i) {
        for (int j = 0; j < 4; ++j) {
            outMatrix[i][j] = i == j ? 1 : 0;
        }
    }
    return outMatrix;
}

Now, if I want to actually create an Identity-Matrix I have to do

Matrix4x4 mat;
Matrix4x4::CreateIdentity(mat);

for the first variant and

Matrix4x4 mat = Matrix4x4::CreateIdentity();

for the second.

The first one obviously yields the advantage that not a single unneccesary copy is done, while it does not allow to use it as an rvalue; imagine

Matrix4x4 mat = Matrix4x4::Identity()*Matrix4x4::Translation(5, 7, 6);

Final Question: Is there a way to avoid unneccesary copies when using Methods like Matrix4x4::CreateIdentity(); whenever possible while still allowing to use the method as an rvalue as in my last code-example? Is it even optimised automatically by the compiler? I'm rather confused how to efficiently go about this (seemingly) simple task. Maybe I should implement both versions and use whatever is appropiate?

There is at least one other way to avoid copy, that is less awkward syntactically: return via `std::move`, and define a move constructor and a move assignment operator in your class. — Violet Giraffe, Feb 10 '16 at 12:57
@VioletGiraffe I think a simple return is supposed to already use the move constructor where applicable, i.e. `select the constructor to use for initialization of the returned value is performed twice: first as if expression were an rvalue expression (thus it may select the move constructor or a copy constructor taking reference to const)`. But I'm no sure if this always holds as I perceive it. Thoughts? — Yam Marcovic, Feb 10 '16 at 13:08
@VioletGiraffe I remembered, I once got a warning for an object that didn't have neither a move nor a copy constructor, and the warning was "Note this wouldn't work if NRVO didn't happen here." But here that's not the case. So once again, thoughts? — Yam Marcovic, Feb 10 '16 at 13:10
@VioletGiraffe If you use `std::move` then NRVO can not happen and the resulting code could be less efficient, If I am not mistaken. — rozina, Feb 10 '16 at 13:11
Possible duplicate of [Returning std::vector by value](http://stackoverflow.com/questions/11247654/returning-stdvector-by-value) — rozina, Feb 10 '16 at 13:14
@rozina Seems to be correct (also as far as I recall from experience): `If a function returns a class type by value, and the return statement's expression is the name of a non-volatile object with automatic storage duration [...]` That is, in `return std::move(X)`, `std::move(X)` would no longer be the name of an object. — Yam Marcovic, Feb 10 '16 at 13:16
An approach I have used is to create an empty `Identity` type, and add constructors and assignment operators that take one of these. Then the constructor/assignment operator takes care of making the internal representation consistent with an identity matrix. — juanchopanza, Feb 10 '16 at 13:29
@juanchopanza Sounds interesting and clever. This way you don't need to bother with spending proportional memory on identity matrices at all. — Yam Marcovic, Feb 10 '16 at 13:30

Yam Marcovic · Accepted Answer · 2016-02-10T13:27:51.047

You mostly don't need to worry about that too much, given that copy elision (in this case, NRVO¹) is part of the standard.

In a bit more detail (dangerously), the version returning a matrix will, most likely, end up allocating it on the stack of the calling function and only initializing it in the called function, without any copy constructors being called.

So unless something is inhibiting this (which you can find out by running it and checking if a copy constructor is or isn't called), then you mostly Don't Need to Worry About It.

If copy elision can't happen (or just won't for some reason, for example if the compiler doesn't want to, since it doesn't have to), then you can still make sure to provide a move constructor which would then be used instead². The good thing here is that it would even work when your return statement involves a conversion to the actual returned type.

References:

If a function returns a class type by value, and the return statement's expression is the name of a non-volatile object with automatic storage duration, which isn't the function parameter, or a catch clause parameter, and which has the same type (ignoring top-level cv-qualification) as the return type of the function, then copy/move is omitted. When that local object is constructed, it is constructed directly in the storage where the function's return value would otherwise be moved or copied to. This variant of copy elision is known as NRVO, "named return value optimization".
If expression is an lvalue expression and the conditions for copy elision are met, or would be met, except that expression names a function parameter, then overload resolution to select the constructor to use for initialization of the returned value is performed twice: first as if expression were an rvalue expression (thus it may select the move constructor or a copy constructor taking reference to const), and if no suitable conversion is available, overload resolution is performed the second time, with lvalue expression (so it may select the copy constructor taking a reference to non-const).

The above rule applies even if the function return type is different from the type of expression (copy elision requires same type).

You may want to mention that if they implement a move semantics with their class that in times when copy elision cannot happen then the move will kick in. — NathanOliver, Feb 10 '16 at 13:13
@NathanOliver Although that would only help if the type is efficiently movable. — juanchopanza, Feb 10 '16 at 13:27
@juanchopanza Right, which--in the case of Matrix--might not be the case, e.g. if it internally uses an array to hold its data as part of the object memory. — Yam Marcovic, Feb 10 '16 at 13:28
@YamMarcovic Exactly, this is quite common for small matrix types. BTW, it may be worth exploring expression templates to avoid creating temporaries in complicated matrix expressions. — juanchopanza, Feb 10 '16 at 13:30
@juanchopanza Wow, I didn't know that technique. I'm blown away. Thanks! — Yam Marcovic, Feb 10 '16 at 13:50
First of all thanks for your answer. I spent the day trying out various versions (including looking at the produces assemblies) and decided to follow your advice to "don't care about it (for now)". I realized that the copy constructor is called less than I thought. Thanks for pointing me in the right direction (I read a lot about copy elision/(N)RVO, as well as move semantics). I will come back to the relevant code when I actually realize it should be optimized more. — LukeG, Feb 10 '16 at 18:48

Copy overhead when returning (big) objects?

1 Answers1

Related