Transforming CountVectorizer with entropy (log-entropy) / sklearn

Asked Feb 25 '14 at 11:14

Active Feb 25 '14 at 11:14

Viewed 570 times

I would like to try out some variations around Latent Semantic Analysis (LSA) with scikit-learn. Besides pure frequency counts from CountVectorizer() and the weighted result of TfidfTransformer(), I'd like to test weighting by entropy (and log-entropy) (used in the original papers and reported to perform very well).

Any suggestions on how to proceed? I know Gensim has an implementation (LogEntropyModel()) but would prefer to stick with scikit-learn.

asked Feb 25 '14 at 11:14

emiguevara

1,359
13
26

Well, you need to implement that formula as a transformer or tie scikit-learn and Gensim together. Which part is given you trouble? – Fred Foo Feb 26 '14 at 11:25
Hi, I haven't started yet... so no trouble so far. I am just asking for suggestions. – emiguevara Feb 26 '14 at 11:46
I think the scikit-learn mailing list is more appropriate for this kind of question. – Fred Foo Feb 26 '14 at 13:08

Transforming CountVectorizer with entropy (log-entropy) / sklearn

0 Answers0