0

I am looking for help on how I can use the class PorterStemFilter in Lucene 4.0. Below is my indexer taken from http://www.lucenetutorial.com/lucene-in-5-minutes.html:

...

  StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
  Directory index = new RAMDirectory();
  IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);

  IndexWriter w = new IndexWriter(index, config);
  addDoc(w, "Lucene in Action", "193398817");
  addDoc(w, "Lucene for Dummies", "55320055Z");

......

Could someone help me with where and how to use the PorterStemFilter class

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
user2161903
  • 577
  • 1
  • 6
  • 22

1 Answers1

2

Filters are generally incorporated into an Analyzer. To create you own Analyzer, the only thing you really need to override is the TokenStream method.

If you just want to chuck a the stem filter into StandardAnalyzer, I would copy the implementation of tokenStream from StandardAnalyzer, and add the filter at the appropriate location (with stemmers, usually they should be added late in the filter chain).

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
    StandardTokenizer tokenStream = new StandardTokenizer(Version.LUCENE_46, reader);
    tokenStream.setMaxTokenLength(255);
    TokenStream result = new StandardFilter(tokenStream);
    result = new LowerCaseFilter(result);
    result = new StopFilter(true, result, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
    //Adding the StemFilter here
    result = new PorterStemFilter(result);
    return result;
}

Alternatively, you could just use EnglishAnalyzer (among other languages), which already has a stemmer.

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • I don't really need to create my own Analyzer if they already handle Filters. But, I was wondering why I am not getting hits for queries like 'country'. I can get hits for the query 'countires'. – user2161903 Feb 21 '14 at 22:33
  • Not sure I understand... So you have documents with "countries", and want a query for "country" to find them? Yes, that would be an appropriate time to use a stemmer. – femtoRgon Feb 21 '14 at 22:40
  • Yes, that is exactly what I wanted. I am using StandardAnalyzer as my analyzer and Lucene version 4.6. If stemmer is incorporated in the analyzer, why is it failing to retrieve documents when the query is 'country'? – user2161903 Feb 21 '14 at 23:48
  • No, there is no stemming in `StandardAnalyzer`. Use `EnglishAnalyzer` (link in the answer above). – femtoRgon Feb 22 '14 at 00:38