0

I'm trying to evaluate switching stemming filters in Solr from Porter to KStem. I see reference to the ability to configure KStem via a direct_conflations.txt file and other files, but I can't seem to find documentation on how this file should be formatted or how to tell KStem to load this configuration file.

Here is an example solr config in schema.xml that loads KStem

<analyzer type="query">
  <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    <filter class="solr.KStemFilterFactory"/>
</analyzer>

With Porter, you are able to configure protected words like so:

<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>

I'm wondering if there is a comparable way to configure KStem, does anyone know or does anyone know where this is documented?

Reggie Pharkle
  • 140
  • 1
  • 4

1 Answers1

1

Your best friend is looking at solr source code. I gave it a quick look and found that unlike EnglishPorterFilterFactory, KStemFilterFactory does not look for protected tokens list. HTH.

user1452132
  • 1,758
  • 11
  • 21