4

I have a somewhat large document and want to do stop-word elimination and stemming on the words of this document with Python. Does anyone know an of the shelf package for these? If not a code which is fast enough for large documents is also welcome. Thanks

a paid nerd
  • 30,702
  • 30
  • 134
  • 179
Hossein
  • 40,161
  • 57
  • 141
  • 175

2 Answers2

8

NLTK supports this.

Ken Bloom
  • 57,498
  • 14
  • 111
  • 168
4

If for some reason you don't want to use NLTK, you can try PyStemmer. For stop words just download a list (google it) and filter them out.

Miki Tebeka
  • 13,428
  • 4
  • 37
  • 49