Why doesn't EmpiricalCovariance output a matrix with a constant diagonal?

Question

When using EmpiricalCovariance to develop a covariance matrix for high-dimensional data, I would expect the diagonal of this matrix (from the top-left to the bottom-right) to be all ones, as of course a variable is always going to perfectly correlate to itself. However, this is not the case. Why not?

Here is an example, plotted with a seaborns heatmap:

As you can see, the diagonal is lighter than most of the data, however it's not as light as the lightest point.

score 2 · Accepted Answer · answered Apr 27 '19 at 03:26

If you look at the implementation of EmpiricalCovariance class and utility function that it invokes, you see that np.cov(data, bias=1) is (almost) the same as EmpiricalCovariance.fit(...).covariance_.

Lets do some experiments:

from sklearn.covariance import EmpiricalCovariance
import numpy as np

np.random.seed(10)
data = np.random.rand(10, 10)
np.allclose(EmpiricalCovariance().fit(data).covariance_, np.cov(data.T, bias=1))
# returns True !

From the numpy's official docs you could see that diagonal elements of covariance matrix are row-variances:

np.isclose(np.var(data[0]), np.cov(data, bias=1)[0][0])
# returns TRUE

score 0 · Answer 2 · answered Nov 05 '19 at 10:48

0

See this related thread from another SO post

In summary: what you see in the diagonals is the variance, not the correlation

answered Nov 05 '19 at 10:48

jsga

156
1
5

Why doesn't EmpiricalCovariance output a matrix with a constant diagonal?

2 Answers2