1

I'm using the Hunspell Stemmer in python. Some of the stemming is not what I want and I want Hunspell to ignore certain words when stemming.

In particular, it converts ABS to AB and table to t.

In the aff file, I saw ABS and AB in the list and I tried removing them but it doesn't work.

I figured it out: In the aff file, when you change the rules for the word, say for "ABS", just realize that there is another list of words in the lower part of the file which has "ABS" in small-caps so "abs"... you have to edit those rules too...

Found the Solution: When changing the rules in the aff file, I only edited the rule for ABS... I realized that if you keep scrolling, there is a 'small caps' version of the rules, so I had to make the same edits later down in the dictionary.

  • Assuming you are using [this](https://github.com/blatinier/pyhunspell), and you call the stemmer for each word individually, why don't you just blacklist abbreviations in your code, and do some sanity check for the other issue (e.g. if the result is less than half as long as the original, use the original instead of the stemmed form) – L3viathan Aug 05 '19 at 07:23
  • Yes I am using that library and I stem words individually. Is there a way to specify a blacklist of words directly in Hunspell? Doing it in code is not a scalable solution because there are other words as well... There is a way to do it directly in the aff and dict files - it's just not well documented what each item in aff means. – Andrea Russett Aug 05 '19 at 09:41

0 Answers0