2

I fitting an LSA model on TfIdf matrix. My original matrix has

(20, 22096) then I'm applying TruncatedSVD to perform the LSI/Reduction

svd = TruncatedSVD(n_components=200, random_state=42, n_iter=10) svdProfile = svd.fit_transform(profileLSAVectors) print(np.shape(svdProfile)) #result (20, 20)

instead of get (20,200) i'm getting (20, 20)

anyone has any idea about why ....?

1 Answers1

2

Its the "expected" behaviour in most decomposition procedures in Scikit-learn.

I cannot find this mentioned in documentation of TruncatedSVD, but you can see the documentation for PCA, where its mentioned that:

n_components == min(n_samples, n_features)

You can try posting this on the scikit-learn github issues page to get more clarity.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
  • "Its the desired behaviour in most decomposition procedures" - can you please explain / refer a link to why this is? I'd like to know more. – Raghuveer Dec 26 '20 at 20:04
  • 1
    @Raghuveer. Better word should be "expected" instead of "desired". I am sorry but I dont have any resources. Maybe you can look into the linked documentation for PCA above and go through the research papers linked there to get details. – Vivek Kumar Dec 28 '20 at 08:57