17

I am using the Eigen library in C++: I am currently calculating the covariance matrix myself as follows:

Eigen::MatrixXd covariance_matrix = Eigen::MatrixXd::Constant(21, 21, 0);
data mean = calc_mean(all_data)
for(int j = 0; j < 21; j++){
    for(int k = 0; k < 21; k++){
        for(std::vector<data>::iterator it = all_data.begin(); it!= all_data.end(); it++){
            covariance_matrix(j,k) += ((*it)[j] - mean[j]) * ((*it)[k] - mean[k]);
        }
        covariance_matrix(j,k) /= all_data.size() - 1;
    }
}

Is there an inbuilt/more optimized way to do this with the Eigen library? For example if I store my data in a MatrixXd where each row is an observation and each column a feature?

Thanks

us2012
  • 16,083
  • 3
  • 46
  • 62
Aly
  • 15,865
  • 47
  • 119
  • 191
  • If you don't mind using OpenCV, you may use [cvMatchTemplate() with method=CV_TM_CCORR_NORMED](https://stackoverflow.com/questions/4621844/cross-correlation-of-two-arrays-in-opencv). – jhuai Jan 19 '21 at 05:59

2 Answers2

57

Using Eigen expressions will leverage SIMD and cache optimized algorithms, so yes it should definitely be faster, and in any case, much simpler to write:

MatrixXd centered = mat.rowwise() - mat.colwise().mean();
MatrixXd cov = (centered.adjoint() * centered) / double(mat.rows() - 1);

Moreover, assuming "data" is a typedef for a double[21], then you can use the Map<> feature to view your std::vector as an Eigen object:

Map<Matrix<double,Dynamic,21,RowMajor> > mat(&(all_data[0][0], all_data.size(), 21);
Ben Jackson
  • 90,079
  • 9
  • 98
  • 150
ggael
  • 28,425
  • 2
  • 65
  • 71
  • 7
    Be careful for `mat.rows() == 1`. – Anne van Rossum Apr 22 '15 at 12:28
  • Why `mat.colwise()`? – quant_dev May 10 '20 at 12:56
  • @quant_dev You compute the mean for each column which gives you a row vector, i.e. if `auto xm = mat.colwise().mean()` then `xm[i]` is the mean of the i-th column of `mat`. Then for each row of `mat` you need to subtract this mean-vector to make the matrix centered, so you need to compute `mat.rowwise() - xm`, which you can also write as a one liner. Note, that it is assumed here, that a feature vector is a row vector and samples go downwards (vertical). If you organise your data with feature vectors being column vectors and samples going across, you need to transpose the formula. – Elmar Zander Mar 21 '22 at 21:41
  • BTW: don't try to replace the `MatrixXd` with `auto` or you'll get into big trouble. If you really want to use `auto`, you need to put the whole formulas into parentheses and `.eval()` them. – Elmar Zander Mar 21 '22 at 21:45
  • I don't understand the (centered.adjoint() * centered). How does it work? I did not see it being used anywhere else. – Lenz Jun 24 '22 at 07:33
7

When each row is an observation, you can use the matrix formulation for the sample covariance matrix as shown on wikipedia ( http://en.wikipedia.org/wiki/Sample_mean_and_sample_covariance#Sample_covariance )

Sample covariance, source: wikipedia article linked above .

This is fairly easy to write in terms of Eigen matrix multiplications etc. Whether it will be more performant isn't obvious to me, I suspect the optimizer would have to do a really good job (be sure to use at least -O2). It may be worth trying and profiling it.

us2012
  • 16,083
  • 3
  • 46
  • 62