I'm trying to figure out 2 things regarding the LDA in mlxtend and the LDA on sklearn:
- for 3 classes, why do the LDA results differ between mlxtend and sklearn
- for 2 classes, why am I not able to get the same discriminants between mlxtend and sklearn.
everything is implemented on the iris dataset from mlxtend
from mlxtend.data import iris_data
from mlxtend.preprocessing import standardize
from mlxtend.feature_extraction import LinearDiscriminantAnalysis
X_iris, y_iris = iris_data()
X_iris = standardize(X_iris)
lda_iris = LinearDiscriminantAnalysis()
lda_iris.fit(X_iris, y_iris)
X_lda_iris = lda_iris.transform(X_iris)
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
clf = LinearDiscriminantAnalysis(solver='eigen')
clf.fit(X_iris, y_iris)
transformed_iris = clf.transform(X_iris)
import matplotlib.pyplot as plt
with plt.style.context('seaborn-whitegrid'):
plt.figure(figsize=(6, 4))
for lab, col in zip((0, 1, 2),
('blue', 'red', 'green')):
plt.scatter(X_lda_iris[y_iris == lab, 0],
X_lda_iris[y_iris == lab, 1],
label=lab,
c=col)
plt.xlabel('Linear Discriminant 1')
plt.ylabel('Linear Discriminant 2')
plt.legend(loc='lower right')
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
with plt.style.context('seaborn-whitegrid'):
plt.figure(figsize=(6, 4))
for lab, col in zip((0, 1, 2),
('blue', 'red', 'green')):
plt.scatter(transformed_iris[y_iris == lab, 0],
transformed_iris[y_iris == lab, 1],
label=lab,
c=col)
plt.xlabel('Linear Discriminant 1')
plt.ylabel('Linear Discriminant 2')
plt.legend(loc='lower right')
plt.tight_layout()
plt.show()
you can see that the graphs are mirroring each other.
second thing: how come the LDA in mlxtend is able to return more than n-1 number of discriminants while sklearn returns n-1 (n= number of classes in the label).
I would expect both modules to give the same exact result.