I'm about to launch into a Lucene.NET implementation and I am concerned about using the PorterStemFilter
. Reading here, and reading source code, it appears to be far, far too aggressive for my needs.
I need something simpler that doesn't look for roots but just removes "er", "ed", "s", etc suffixes. From what I've read, KStem would do the trick.
I can't for the life of me find a .NET version of KStem. I can't even find source code for the Java version to handroll a port.
Could someone point me in the right direction?
Looks like it is easy enough to handcraft a reduced PorterStemmer by simply removing steps I don't want. Anyone have success with that?