0

I am using 2 modules for NLP one is nltk and the other is hunspell. The reason of using hunspell is that I have suffix and affix rules those needs to be followed.

from nltk.stem.porter import *
stemmer = PorterStemmer()
stemmer.stem('ladies')

ladi

from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
lemmatizer.lemmatize('ladies')

lady

The nltk module works as expected as shown above. But hunspell module seems to support only lemmatization and there is no way to return stemmed form.

import hunspell
hobj = hunspell.HunSpell('en_US.dic', 'en_US.aff')
hobj.stem('ladies')

This returns "lady" and not "ladi" as one would expect. Is there any way to return the stemmed form of a word using hunspell module?

David Batista
  • 3,029
  • 2
  • 23
  • 42
shantanuo
  • 31,689
  • 78
  • 245
  • 403
  • 2
    Because `stemmer != lemmatizer` and `(stemmer | lemmatizer) != spellchecker`. It's an XY sort of problem to conflate stemmer (Porter), lemmatizer (Wordnet morhpy) and spellchecker (Hunspell) ;P – alvas Oct 28 '18 at 14:59
  • @alvas, in my opinion with well written "dictionaries", hunspell is far more the a spellchecker (is is a lexical analyzer, including lemmatization) . – JJoao Jun 04 '19 at 10:35

0 Answers0