I would like to try out some variations around Latent Semantic Analysis (LSA) with scikit-learn. Besides pure frequency counts from CountVectorizer()
and the weighted result of TfidfTransformer()
, I'd like to test weighting by entropy (and log-entropy) (used in the original papers and reported to perform very well).
Any suggestions on how to proceed? I know Gensim has an implementation (LogEntropyModel()
) but would prefer to stick with scikit-learn.