Scikit-learn TruncatedSVD documentation

Question

I plan to use sklearn.decomposition.TruncatedSVD to perform LSA for a Kaggle competition, I know the math behind SVD and LSA but I'm confused by scikit-learn's user guide, hence I'm not sure how to actually apply TruncatedSVD.

In the doc, it states that:

After this operation,

U_k * transpose(S_k) is the transformed training set with k features (called n_components in the API)

Why is this? I thought after SVD, X, at this time X_k should be U_k * S_k * transpose(V_k)?

And then it says,

To also transform a test set X, we multiply it with V_k: X' = X * V_k

What does this mean?

score 1 · Answer 1 · edited Dec 06 '19 at 18:34

1

I like the documentation Here a bit better. Sklearn is pretty consistent in that you almost always use some kind of combination of the following code:

#import desired sklearn class
from sklearn.decomposition import TruncatedSVD 

trainData= #someArray
testData = #someArray

model = TruncatedSVD(n_components=5, random_state=42)
model.fit(trainData) #you fit your model on the underlying data

if you want to transform that data instead of just fitting it,

model.fit_transform(trainData) #fit and transform underlying data

Similarly, if you weren't transforming data, but making a prediction instead, you would use something like:

predictions =  model.transform(testData)

Hope that helps...

edited Dec 06 '19 at 18:34

LYu

2,316
4
21
38

answered Mar 28 '16 at 01:45

flyingmeatball

7,457
7
44
62

1

`---> 13 svd.predict(test) AttributeError: 'TruncatedSVD' object has no attribute 'predict' ` – Rahul Bali Jun 23 '18 at 15:01
Edited with `transform` instead of `predict` – LYu Dec 06 '19 at 16:08

Scikit-learn TruncatedSVD documentation

1 Answers1