Bug in scikit-learns LDA function - plots shows non-zero correlation

Question

I did some LDA using scikit-learn's LDA function and I noticed in my resulting plots that there is a non-zero correlation between LDs.

from sklearn.lda import LDA
sklearn_lda = LDA(n_components=2)
transf_lda = sklearn_lda.fit_transform(X, y)

This is very concerning, so I went back and used the Iris data set as reference. I also found in the scikit documentation the same non-zero correlation LDA plot, which I could reproduce.

Anyway, to give you an overview how it looks like

Plot in the upper left: there is clearly something wrong here
Plot in the lower left: This is on raw data, not a correct approach, but one attempt to replicate scikit's resuls
Plots in the upper right and lower right: this is how it should actually look like.

enter image description here

I have put the code into an IPython notebook if you want to take a look at it and try it yourself.

The scikit-documentation that is consistent with the (wrong) result in the upper-left: http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_vs_lda.html

The LDA in R, which is shown in the lower right: http://tgmstat.wordpress.com/2014/01/15/computing-and-visualizing-lda-in-r/

There is a lot of variance scaling built-in to the scikit-learn LDA as it is now, [see here around line 160](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/lda.py). You might have better luck replicating the results by doing this - I personally don't like all the scaling by default, but this is one of the older algorithms in sklearn from what I understand. There are also multiple algorithms for LDA - I think there is actually a PR to implement a covariance based (with shrinkage) LDA now. — Kyle Kastner, Jul 29 '14 at 08:02
No matter of the scaling, there shouldn't be any correlation between the LDs. I don't know, but it really looks like that it is more than just a scaling problem! — , Jul 29 '14 at 15:35
Look at the sklearn code - then look at your code. There is variance normalization in the sklearn LDA that you are not doing, hence there are different answers. — Kyle Kastner, Jul 29 '14 at 16:46
@KyleKastner Hm, it looks like z-score normalization (scaling to unit variance) to me: that's what I am doing in the upper right. And what R is also doing by default. But it shouldn't matter whether you normalize (here: z-score) or mean-center the data; the result should be the same since units are measured on the same scale in this dataset (centimeters). — , Jul 29 '14 at 16:50
Lines 149:155 of the sklearn code do additional variance scaling within class which is different than just a standard scaler at input. Scalings then get normalized by standard deviation as well. Then there is another normalization (lines 169:174, second SVD) to project scalings to another, similar space. My guess is the algorithm you implemented called LDA is not the same as the algorithm *called* LDA in scikit-learn (which looks like a more specialized LDA with certain assumptions to me). — Kyle Kastner, Jul 29 '14 at 18:30
In addition, line 162 is doing a whitening of the first scaling components, which you also have not implemented. All these things are probably why your plots are different than sklearn's — Kyle Kastner, Jul 29 '14 at 18:35
@KyleKastner Thanks, I will experiment with the whitening transform a little bit. Anyways, the algorithm implemented scikit-learn is confusing and misleading since it obviously not doing an LDA in the way that it is commonly described in literature (e.g., the original 2-class LDA by Fisher, the generalized form by Rao for multiple classes, and described in the well-established "Pattern Classification" by Duda, Hart, and Stork). What's especially confusing is that it looks like the variables are correlated, which is totally the opposite of what should happen. — , Jul 29 '14 at 19:01
I answered further on the [Github issue](https://github.com/scikit-learn/scikit-learn/issues/3500) you raised. — Kyle Kastner, Jul 29 '14 at 19:10
It would be great if you could pinpoint the differences and find out where exactly and how exactly the implementations do something different. — eickenberg, Jul 29 '14 at 20:10

score 0 · Answer 1 · answered Jul 30 '14 at 14:57

Okay, now, what's going on (based on the discussion GitHub) is that the LDA in scikit-learn's doesn't have an orthonormal basis.

I want to post this as an answer to so that I close this question now. Thanks for the discussion!

Scikit-learn

enter image description here

from sklearn.decomposition import PCA
sklearn_pca = PCA(n_components=2)
transf_pca = sklearn_pca.fit_transform(transf_lda)

enter image description here

Step-by-step approach

And for the comparison, here the step-by-step approach again

enter image description here

score 0 · Accepted Answer · answered Aug 26 '14 at 16:55

There was indeed a bug in the LDA's transform function: the classifier weights were erroneously applied after the actual transform. This has been fixed here. The changes have been merged into the master branch, so it should be in the 1.6 release of scikit-learn.

Bug in scikit-learns LDA function - plots shows non-zero correlation

2 Answers2

Scikit-learn

Step-by-step approach