1

I have a cloud of points in a std::vector<double> in an x, y, z pattern, and a std::vector<int> of indices where each triplet of consecutive integers is the connectivity of a face. Basically a simple triangular mesh data structure.

I have to compute the areas of all the faces and I am benchmarking several methods:

I can wrap chunks of data in an Eigen::Map<const Eigen::Vector3d> like this:

static void face_areas_eigenmap(const std::vector<double>& V,
                                const std::vector<int>& F,
                                std::vector<double>& FA) {
  // Number of faces is size / 3.
  for (auto f = 0; f < F.size() / 3; ++f) {
    // Get vertex indices of face f.
    auto v0 = F[f * 3];
    auto v1 = F[f * 3 + 1];
    auto v2 = F[f * 3 + 2];
    
    // View memory at each vertex position as a vector.
    Eigen::Map<const Eigen::Vector3d> x0{&V[v0 * 3]};
    Eigen::Map<const Eigen::Vector3d> x1{&V[v1 * 3]};
    Eigen::Map<const Eigen::Vector3d> x2{&V[v2 * 3]};
    
    // Compute and store face area.
    FA[f] = 0.5 * (x1 - x0).cross(x2 - x0).norm();
  }
}

Or I can choose to create Eigen::Vector3d like this:

static void face_areas_eigenvec(const std::vector<double>& V,
                                const std::vector<int>& F,
                                std::vector<double>& FA) {
  for (auto f = 0; f < F.size() / 3; ++f) {
    auto v0 = F[f * 3];
    auto v1 = F[f * 3 + 1];
    auto v2 = F[f * 3 + 2];
    
    // This is the only change, swap Map for Vector3d.
    Eigen::Vector3d x0{&V[v0 * 3]};
    Eigen::Vector3d x1{&V[v1 * 3]};
    Eigen::Vector3d x2{&V[v2 * 3]};

    FA[f] = 0.5 * (x1 - x0).cross(x2 - x0).norm();
  }
}

Finally I am also considering the hardcoded version with the explicit cross product and norm:

static void face_areas_ptr(const std::vector<double>& V,
                           const std::vector<int>& F, std::vector<double>& FA) {
  for (auto f = 0; f < F.size() / 3; ++f) {
    const auto* x0 = &V[F[f * 3] * 3];
    const auto* x1 = &V[F[f * 3 + 1] * 3];
    const auto* x2 = &V[F[f * 3 + 2] * 3];

    std::array<double, 3> s0{x1[0] - x0[0], x1[1] - x0[1], x1[2] - x0[2]};
    std::array<double, 3> s1{x2[0] - x0[0], x2[1] - x0[1], x2[2] - x0[2]};

    std::array<double, 3> c{s0[1] * s1[2] - s0[2] * s1[1],
                            s0[2] * s1[0] - s0[0] * s1[2],
                            s0[0] * s1[1] - s0[1] * s1[0]};

    FA[f] = 0.5 * std::sqrt(c[0] * c[0] + c[1] * c[1] + c[2] * c[2]);
  }
}

I have benchmarked these methods and the version using Eigen::Map is always the slowest despite doing the same exact thing as the one using Eigen::Vector3d, I was expecting no change in performance as a map is basically a pointer.

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
BM_face_areas_eigenvec   59757936 ns     59758018 ns           11
BM_face_areas_ptr        58305018 ns     58304436 ns           11
BM_face_areas_eigenmap   62356850 ns     62354710 ns           10

I have tried switching the Eigen template expression in the map version with the same code as in the pointer version:

std::array<double, 3> s0{x1[0] - x0[0], x1[1] - x0[1], x1[2] - x0[2]};
std::array<double, 3> s1{x2[0] - x0[0], x2[1] - x0[1], x2[2] - x0[2]};

std::array<double, 3> c{s0[1] * s1[2] - s0[2] * s1[1],
                        s0[2] * s1[0] - s0[0] * s1[2],
                        s0[0] * s1[1] - s0[1] * s1[0]};

FA[f] = 0.5 * std::sqrt(c[0] * c[0] + c[1] * c[1] + c[2] * c[2]);

And magically the timings are comparable:

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
BM_face_areas_array      58967864 ns     58967891 ns           11
BM_face_areas_ptr        60034545 ns     60034682 ns           11
BM_face_areas_eigenmap   60382482 ns     60382027 ns           11

Is there something wrong with Eigen::Map in Eigen expressions to be aware of?

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
lucmobz
  • 151
  • 1
  • 6
  • In this simple case the `Map` just adds a level of indirection which the compiler may have trouble to optimize away ... – chtz Dec 01 '21 at 21:38

1 Answers1

0

Looking at the compiler output it seems like the second version makes the compiler emit fewer memory loads by aggregating some of them into vector loads. https://godbolt.org/z/qs38P41eh

Eigen's code for cross does not contain any explicit vectorization. It depends on the compiler doing a good job with it. And because you call cross on an expression (the subtractions), the compiler gives up a little too soon. Basically, it is the compiler's fault for not finding the same optimization.

Your third code works the same as the second because the compiler recognizes the subtraction (creation of s0 and s1) as something it can do vectorized, resulting in equivalent code. You can achieve the same with Eigen if you do it like this:

    Eigen::Map<const Eigen::Vector3d> x0{&V[v0 * 3]};
    Eigen::Map<const Eigen::Vector3d> x1{&V[v1 * 3]};
    Eigen::Map<const Eigen::Vector3d> x2{&V[v2 * 3]};
    
    Eigen::Vector3d s0 = x1 - x0;
    Eigen::Vector3d s1 = x2 - x0;

    // Compute and store face area.
    FA[f] = 0.5 * s0.cross(s1).norm();
Homer512
  • 9,144
  • 2
  • 8
  • 25