3

I am writing a general purpose library using Eigen for computational mechanics, dealing mostly with 6x6 sized matrices and 6x1 sized vectors. I consider using the Eigen::Ref<> template to make it usable also for segments and blocks, as documented in http://eigen.tuxfamily.org/dox/TopicFunctionTakingEigenTypes.html and Correct usage of the Eigen::Ref<> class

However, a small performance comparison reveals that Eigen::Ref has a considerable overhead for such small functions compared to standard c++ references:

#include <ctime>
#include <iostream>
#include "Eigen/Core"


Eigen::Matrix<double, 6, 6> testRef(const Eigen::Ref<const Eigen::Matrix<double, 6, 6>>& A)
{
    Eigen::Matrix<double, 6, 6> temp = (A * A) * A;
    temp.diagonal().setOnes();
    return temp;
}

Eigen::Matrix<double, 6, 6> testNoRef(const Eigen::Matrix<double, 6, 6>& A)
{
    Eigen::Matrix<double, 6, 6> temp = (A * A) * A; 
    temp.diagonal().setOnes();
    return temp;
}


int main(){

  using namespace std;

  int cycles = 10000000;
  Eigen::Matrix<double, 6, 6> testMat;
  testMat = Eigen::Matrix<double, 6, 6>::Ones();

  clock_t begin = clock();

  for(int i = 0; i < cycles; i++)
      testMat = testRef(testMat);

  clock_t end = clock();


  double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;

  std::cout << "Ref: " << elapsed_secs << std::endl;

  begin = clock();

  for(int i = 0; i < cycles; i++)
      testMat = testNoRef(testMat);
  end = clock();

  elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;

  std::cout << "noRef : " << elapsed_secs << std::endl;


    return 0;
}

Output with gcc -O3:

Ref: 1.64066
noRef : 1.1281

So it seems that Eigen::Ref has considerable overhead, at least in cases with low actual computational effort. On the other hand, the approach using const Eigen::Matrix<double, 6, 6>& A leads to unnecessary copies if blocks or segments are passed:

#include <Eigen/Core>
#include <iostream>


void test( const Eigen::Vector3d& a)
{
    std::cout << "addr in function " << &a << std::endl;
}

int main () {

    Eigen::Vector3d aa;
    aa << 1,2,3;
    std::cout << "addr outside function " << &aa << std::endl;

    test ( aa ) ;
    test ( aa.head(3) ) ;


    return 0;
}

Output:

addr outside function 0x7fff85d75960
addr in function 0x7fff85d75960
addr in function 0x7fff85d75980

So this approach is excluded for the general case.

Alternatively, one could make function templates using Eigen::MatrixBase, as described in the documentation. However, this seems to be inefficient for large libraries, and it cannot be adapted to fixed size matrices (6x6, 6x1) as in my case.

Is there any other alternative? What is the general recommendation for large general purpose libraries?

Thank you in advance!

edit: Modified the first benchmark example according to the recommendations in the comments

mneuner
  • 433
  • 5
  • 25
  • 2
    can't reproduce with optimizations on I got `Ref: 0.069` `noRef : 0.069` if performance without optimizations is important then yeah Eigen in general will have a huge overhead, but that mostly disappears with optimizations – PeterT Oct 14 '18 at 08:57
  • 1
    Did you test with optimization enabled (e.g. `-O2`)? If not, your results are not trustable but if... Your test functions have no side-effect. This bears the danger that `Eigen::Matrix temp = (A * A);` is optimized away. You should return the values, store them e.g. in a `vector` and print them (after measuring) to prevent their "vanashing" due to optimization. A look into assembly could help also to uncover what of your code actually reaches the binary. – Scheff's Cat Oct 14 '18 at 08:58
  • The benchmark is invalid because once optimizations are turned ON, the compiler is free to completely remove the body of the functions. – ggael Oct 14 '18 at 16:27
  • I modified the benchmark example and enabled optimization. However, a difference remains. – mneuner Oct 14 '18 at 19:17
  • @macmallow I still can't reliably make it slower. VS2017 now makes it somwhat slower, but if I keep repeating the test the variance in speed within the same test seems higher than the difference in mean between the two. In gcc I now have Ref being faster most of the time. I guess it's time to see compilerversion, CPU and Eigen version – PeterT Oct 15 '18 at 07:57
  • @PeterT Sure: g++ (GCC) 8.2.1 2018083, Eigen 3.3.5 and Intel© Core™ i7-7820X CPU @ 3.60GHz × 8 – mneuner Oct 15 '18 at 08:23

1 Answers1

4

With Ref<> you pay the price of loosing two information (compared to Matrix):

  1. You lost the knowledge that the input is memory aligned.
  2. You lost the compile-time knowledge that columns are sequentially stored (and thus two columns are separated by 6 doubles).

That's the classical tradeoff between genericity and highest performance.

ggael
  • 28,425
  • 2
  • 65
  • 71
  • I see. So may I conclude that I should stick to C++ references as long as possible (i.e., as long as temporary copies are excluded), and implement functions with Ref<> only if necessary? – mneuner Oct 15 '18 at 08:29