Using stop words with WhitespaceAnalyzer

Question

Lucene's StandardAnalyzer removes dots from string/acronyms when indexing it. I want Lucene to retain dots and hence I'm using WhitespaceAnalyzer class.

I can give my list of stop words to StandardAnalyzer...but how do i give it to WhitespaceAnalyzer?

Thanks for reading.

score 6 · Accepted Answer · answered May 08 '09 at 19:20

6

Create your own analyzer by extending WhiteSpaceAnalyzer and override tokenStream method as follows.

public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = super.tokenStream(fieldName, reader);
    result = new StopFilter(result, stopSet);
    return result;
}

Here the stopSet is the Set of stop words, which you could get by adding a constructor to your analyzer which accepts a list of stop words.

You may also wish to override reusableTokenStream() method in similar fashion if you plan to reuse the TokenStream.

answered May 08 '09 at 19:20

Shashikant Kore

4,952
3
31
40

could you please have a loot at my answer and comment: http://stackoverflow.com/questions/899542/problem-using-same-instance-of-indexsearcher-for-multiple-requests/1014501#1014501 – Steve Chapman Jun 18 '09 at 19:10
@Shashikant Kore - Any inputs for question - http://stackoverflow.com/questions/14554850/solrj-query-get-the-most-relevant-record-first – JHS Feb 03 '13 at 18:40

Using stop words with WhitespaceAnalyzer

1 Answers1