Because everything happens inside a function, the compiler can do escape analysis and optimize away the copies into the vectors.
To check this, I put the code in a function, to look at the assembler:
Eigen::Vector3d foo(const Eigen::VectorXd& vs)
{
const Eigen::Vector3d v0 = vs.segment<3>(0);
const Eigen::Vector3d v1 = vs.segment<3>(3);
const Eigen::Vector3d v2 = vs.segment<3>(6);
const Eigen::Vector3d vt = v0 + v1 + v2;
return vt.transpose() * vt * vt + vt;
}
which turns into this assembler
push rax
mov rax, qword ptr [rsi + 8]
...
mov rax, qword ptr [rsi]
movupd xmm0, xmmword ptr [rax]
movsd xmm1, qword ptr [rax + 16]
movupd xmm2, xmmword ptr [rax + 24]
addpd xmm2, xmm0
movupd xmm0, xmmword ptr [rax + 48]
addsd xmm1, qword ptr [rax + 40]
addpd xmm0, xmm2
addsd xmm1, qword ptr [rax + 64]
...
movupd xmmword ptr [rdi], xmm2
addsd xmm3, xmm1
movsd qword ptr [rdi + 16], xmm3
mov rax, rdi
pop rcx
ret
Notice how the only memory operations are two GP register loads to get start pointer and length, then a couple of memory loads to get the vector content into registers, before we write the result to memory in the end.
This only works since we deal with fixed-sized vectors. With VectorXd
copies would definitely take place.
Alternative benchmarks
Ref is typically used on function calls. Why not try it with a function that cannot be inlined? Or come up with an example where escape analysis cannot work and the objects really have to be materialized. Something like this:
struct Foo
{
public:
Eigen::Vector3d v0;
Eigen::Vector3d v1;
Eigen::Vector3d v2;
Foo(const Eigen::VectorXd& vs) __attribute__((noinline));
Eigen::Vector3d operator()() const __attribute__((noinline));
};
Foo::Foo(const Eigen::VectorXd& vs)
: v0(vs.segment<3>(0)),
v1(vs.segment<3>(3)),
v2(vs.segment<3>(6))
{}
Eigen::Vector3d Foo::operator()() const
{
const Eigen::Vector3d vt = v0 + v1 + v2;
return vt.transpose() * vt * vt + vt;
}
Eigen::Vector3d bar(const Eigen::VectorXd& vs)
{
Foo f(vs);
return f();
}
By splitting initialization and usage into non-inline functions, the copies really have to be done. Of course we now change the entire use case. You have to decide if this is still relevant to you.
Purpose of Ref
Ref exists for the sole purpose of providing a function interface that can accept both a full matrix/vector and a slice of one. Consider this:
Eigen::VectorXd foo(const Eigen::VectorXd&)
This interface can only accept a full vector as its input. If you want to call foo(vector.head(10))
you have to allocate a new vector to hold the vector segment. Likewise it always returns a newly allocated vector which is wasteful if you want to call it as output.head(10) = foo(input)
. So we can instead write
void foo(Eigen::Ref<Eigen::VectorXd> out, const Eigen::Ref<const Eigen::VectorXd>& in);
and use it as foo(output.head(10), input.head(10))
without any copies being created. This is only ever useful across compilation units. If you have one cpp file declaring a function that is used in another, Ref allows this to happen. Within a cpp file, you can simply use a template.
template<class Derived1, class Derived2>
void foo(const Eigen::MatrixBase<Derived1>& out,
const Eigen::MatrixBase<Derived2>& in)
{
Eigen::MatrixBase<Derived1>& mutable_out =
const_cast<Eigen::MatrixBase<Derived1>&>(out);
mutable_out = ...;
}
A template will always be faster because it can make use of the concrete data type. For example if you pass an entire vector, Eigen knows that the array is properly aligned. And in a full matrix it knows that there is no stride between columns. It doesn't know either of these with Ref. In this regard, Ref is just a fancy wrapper around Eigen::Map<Type, Eigen::Unaligned, Eigen::OuterStride<>>
.
Likewise there are cases where Ref has to create temporary copies. The most common case is if the inner stride is not 1. This happens for example if you pass a row of a matrix (but not a column. Eigen is column-major by default) or the real part of a complex valued matrix. You will not even receive a warning for this, your code will simply run slower than expected.
The only reasons to use Ref inside a single cpp file are
- To make the code more readable. That template pattern shown above certainly doesn't tell you much about the expected types
- To reduce code size, which may have a performance benefit, but usually doesn't. It does help with compile time, though
Use with fixed-size types
Since your use case seems to involve fixed-sized vectors, let's consider this case in particular and look at the internals.
void foo(const Eigen::Vector3d&);
void bar(const Eigen::Ref<const Eigen::Vector3d>&);
int main()
{
Eigen::VectorXd in = ...;
foo(in.segment<3>(6));
bar(in.segment<3>(6));
}
The following things will happen when you call foo:
- We copy 3 doubles from
in[6]
to the stack. This takes 4 instructions (2 movapd, 2 movsd).
- A pointer to these values is passed to foo. (Even fixed-size Eigen vectors declare a destructor, therefore they are always passed on the stack, even if we declare them pass-by-value)
- foo then loads the values via that pointer, taking 2 instructions (movapd + movsd)
The following happens when we call bar:
- We create a
Ref<Vector>
object. For this, we put a pointer to in.data() + 6
on the stack
- A pointer to this pointer is passed to bar
- bar loads the pointer from the stack, then loads the values
Notice how there is barely any difference. Maybe Ref saves a few instructions but it also introduces one indirection more. Compared to everything else, this is hardly significant. It is certainly too little to measure.
We are also entering the realm of microoptimizations here. This can lead to situations like this, where just the arrangement of code results in measurably different optimizations. Eigen: Why is Map slower than Vector3d for this template expression?