0

I'm looking to better understand the covariance_ attribute returned by scikit-learn's LDA object.

I'm sure I'm missing something, but I expect it to be the covariance matrix associated with the input data. However, when I compare .covariance_ against the covariance matrix returned by numpy.cov(), I get different results.

Can anyone help me understand what I am missing? Thanks and happy to provide any additional information.

Please find a simple example illustrating the discrepancy below.

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Sample Data
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 0, 0, 0])

# Covariance matrix via np.cov
print(np.cov(X.T))

# Covariance matrix via LDA
clf = LinearDiscriminantAnalysis(store_covariance=True).fit(X, y)
print(clf.covariance_)

1 Answers1

0

In sklearn.discrimnant_analysis.LinearDiscriminantAnalysis, the covariance is computed as follow:

In [1]: import numpy as np 
   ...: cov = np.zeros(shape=(X.shape[1], X.shape[1])) 
   ...: for c in np.unique(y): 
   ...:     Xg = X[y == c, :] 
   ...:     cov += np.count_nonzero(y==c) / len(y) * np.cov(Xg.T, bias=1) 
   ...: print(cov)
array([[0.66666667, 0.33333333],
       [0.33333333, 0.22222222]])  

So it corresponds to the sum of the covariance of each individual class multiplied by a prior which is the class frequency. Note that this prior is a parameter of LDA.

glemaitre
  • 963
  • 6
  • 7
  • 1
    Thank you, this helps. So it looks like they are taking a weighted average of the class-specific covariance matrices, weighted by the priors. Do you happen to have any insight into why they chose this method over the covariance matrix produced by np.cov(), above? – user8456066 Dec 21 '19 at 18:25
  • This the definition of the within scatter matrix (https://sebastianraschka.com/Articles/2014_python_lda.html#21-within-class-scatter-matrix-s_w). So as you said this is the mean of the covariance which is shared when modelling the problem (in LDA). If you are using a QDA, you will instead have a different covariance matrix for each class. The covariance of the full data does not take into account the class label. So they are 3 ways of modelling the problem, indeed. – glemaitre Dec 22 '19 at 00:33