0

I'm trying to analyse a sparse dataset using sklearn LDA (but not only that one, I've also tried a personal implementation). The dataset has 14 columns and some varying number of columns which I've selected to run different experiments, keeping those with most variance.

X = dfplants.values
print(X.shape)
(14,15)
u,s,v = np.linalg.svd(X) 
print(len(s))
y = dfplants_sup['tecnique'].values
lda = LDA(n_components=2, solver='svd', store_covariance=True)
X_lda=lda.fit_transform(X,y)

print("X_lda")
print(X_lda)


X_lda 

[[-6.03602598]
 [-6.14807425]
 [-4.02479902]
 [-5.85982518]
 [-6.96663709]
 [-5.93062031]
 [-6.24874635]
 [ 5.42840829]
 [ 6.5065448 ]
 [ 6.47761884]
 [ 6.50027698]
 [ 6.31051439]
 [ 3.57171076]
 [ 6.41965411]]

It doesn't matter if I use 2 or more components, or if I keep all of them or only two with the most variance, I always get 1 column as a result. Why I'm a getting only one column? What are the requirements to apply LDA?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Sergio HL
  • 53
  • 7

1 Answers1

1

As per the documentation:

n_components : int, optional

    Number of components (< n_classes - 1) for dimensionality reduction.

So if you have a binary problem (of 2 classes), then the number of components returned will be 1. And thats what you are experiencing.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132