0

I'm about to launch into a Lucene.NET implementation and I am concerned about using the PorterStemFilter. Reading here, and reading source code, it appears to be far, far too aggressive for my needs.

I need something simpler that doesn't look for roots but just removes "er", "ed", "s", etc suffixes. From what I've read, KStem would do the trick.

I can't for the life of me find a .NET version of KStem. I can't even find source code for the Java version to handroll a port.

Could someone point me in the right direction?

Looks like it is easy enough to handcraft a reduced PorterStemmer by simply removing steps I don't want. Anyone have success with that?

rae1
  • 6,066
  • 4
  • 27
  • 48
Kevin
  • 1,829
  • 1
  • 21
  • 22
  • 1
    the java source for KStem are available at: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/ – Jf Beaulac Mar 18 '13 at 18:01
  • Any experience to compare using PorterStemmer and KStemmer? – Kevin Mar 19 '13 at 14:47

1 Answers1

0

You could use the HunspellStemmer, part of contrib. It can use freely available hunspell dictionaries to provide proper stemming.

sisve
  • 19,501
  • 3
  • 53
  • 95