0

I plan to use sklearn.decomposition.TruncatedSVD to perform LSA for a Kaggle competition, I know the math behind SVD and LSA but I'm confused by scikit-learn's user guide, hence I'm not sure how to actually apply TruncatedSVD.

In the doc, it states that:

After this operation,

enter image description here

U_k * transpose(S_k) is the transformed training set with k features (called n_components in the API)

Why is this? I thought after SVD, X, at this time X_k should be U_k * S_k * transpose(V_k)?

And then it says,

To also transform a test set X, we multiply it with V_k: X' = X * V_k

What does this mean?

cchamberlain
  • 17,444
  • 7
  • 59
  • 72
howard
  • 255
  • 1
  • 4
  • 12

1 Answers1

1

I like the documentation Here a bit better. Sklearn is pretty consistent in that you almost always use some kind of combination of the following code:

#import desired sklearn class
from sklearn.decomposition import TruncatedSVD 

trainData= #someArray
testData = #someArray

model = TruncatedSVD(n_components=5, random_state=42)
model.fit(trainData) #you fit your model on the underlying data

if you want to transform that data instead of just fitting it,

model.fit_transform(trainData) #fit and transform underlying data

Similarly, if you weren't transforming data, but making a prediction instead, you would use something like:

predictions =  model.transform(testData)

Hope that helps...

LYu
  • 2,316
  • 4
  • 21
  • 38
flyingmeatball
  • 7,457
  • 7
  • 44
  • 62