I'm using Eigen to process an unstructured point set (point cloud), represented as an array of Eigen::Vector3f
objects. In order to enable SIMD vectorization I've subclassed Vector3f
into a class with alignas(16). The objects in the array each start at a 16 byte boundary, have gaps of 4 bytes between each other, and contain uninitialized data.
The subclass looks currently like this: (Stil need to add template copy constructor and operator=
as indicated in Eigen documentation)
struct alignas(16) point_xyz : public Eigen::Vector3f {
using Eigen::Vector3f::Vector3f;
};
point_xyz cloud[n];
Assembly output shows that SIMD instructions are being used and the a program which applies a transformation on each point_xyz
in the array seems to work correctly.
Is it safe to use Eigen that way, or do the results depend on the contents of the unused 4 byte gaps, etc?
Also, would it be safe to put RGB color data or other into the unused 4 bytes (required overriding the memory alignment)?
Edit: It seems that both clang++ and g++ do some vectorization when optimization is enabled. Without optimization (and below -O2 for clang++), both generate a call to a Eigen library function for the following matrix multiplication (transformation):
using transform_t = Eigen::Transform<float, 3, Eigen::Affine>;
transform_t t = Eigen::AngleAxisf(0.01*M_PI, Eigen::Vector3f::UnitX()) * Eigen::Translation3f(0.1, 0.1, 0.1);
Eigen::Vector3f p(123, 234, 345);
std::cout << p << std::endl;
for(;;) {
asm("# BEGIN TRANS");
p = t * p;
asm("# END TRANS");
}
std::cout << p << std::endl;
(The for loop and cout are needed so that the optimization doesn't remove the multiplication or put in a constant value).
In GCC (-O1) it results in
# 50 "src/main.cc" 1
# BEGIN TRANS
# 0 "" 2
movss (%rsp), %xmm4
movaps %xmm4, %xmm2
mulss 64(%rsp), %xmm2
movss 4(%rsp), %xmm0
movaps %xmm0, %xmm1
mulss 80(%rsp), %xmm1
addss %xmm1, %xmm2
movss 8(%rsp), %xmm3
movaps %xmm4, %xmm5
mulss 68(%rsp), %xmm5
movaps %xmm0, %xmm1
mulss 84(%rsp), %xmm1
addss %xmm5, %xmm1
movaps %xmm3, %xmm5
mulss 100(%rsp), %xmm5
addss %xmm5, %xmm1
addss 116(%rsp), %xmm1
mulss 72(%rsp), %xmm4
mulss 88(%rsp), %xmm0
addss %xmm4, %xmm0
movaps %xmm3, %xmm4
mulss 104(%rsp), %xmm4
addss %xmm4, %xmm0
addss 120(%rsp), %xmm0
mulss 96(%rsp), %xmm3
addss %xmm3, %xmm2
addss 112(%rsp), %xmm2
movss %xmm2, (%rsp)
movss %xmm1, 4(%rsp)
movss %xmm0, 8(%rsp)
# 52 "src/main.cc" 1
# END TRANS
# 0 "" 2
It results in the same output with and without #define EIGEN_DONT_VECTORIZE 1
. With Vector4f
, a slightly shorter output is generated when Eigen's vectorization is not disabled, but both operate on the xmm registers.
AlignedVector3<float>
doesn't seem to support the multiplication with Eigen::Transform
. I'm doing affine transformations on sets of points, represented using 3 (non-homogenuous) coordinates. I'm not sure how Eigen implements the transformation with Eigen::Transform<float, 3, Eigen::Affine>
of a Eigen::Vector4f
vector. I.e. does it only change the first 3 components of the vector, and does the fourth component have to be zero, or can it contain an arbitrary value, or does it interpret the 4-vector as homogenous coordinates? And does it depend on the internal representation of the transformation (Affine
, AffineCompact
, Projective
).