3

I am trying to find top 100/1000 words based on tfidfVectorizer output of Python's scikit-learn library. Is there a way to do it using a function from the scikit libraries?

Thanks for help

Kyle Kelley
  • 13,804
  • 8
  • 49
  • 78
Harshit
  • 1,207
  • 1
  • 20
  • 40
  • top 100/1000 words based on tfidf values given by tfidf vectorizer. I tried to sum up values for every column , but indexing is not allowed in sparse representation – Harshit Oct 28 '13 at 07:22

1 Answers1

0

What do you mean by top 100/1000 words? The most frequent words in a dataset? You can use the Counter class of the Python standard library to do that. No need for scikit-learn.

ogrisel
  • 39,309
  • 12
  • 116
  • 125
  • 1
    top 100/1000 words based on tfidf values given by tfidf vectorizer. I tried to sum up values for every column , but indexing is not allowed in sparse representation . – Harshit Oct 27 '13 at 05:57
  • 1
    @user595169 Do you mean `X.sum(0)`? – Fred Foo Oct 27 '13 at 12:11