-1

i have requirement to incorporate N Gram in my search engine and am using lucene 4.4 as my search engine. basically am finding some hard time to learn NGram, could some one help me out by showing some simple steps?

thanks in advance!!

  • http://www.philippeadjiman.com/blog/2009/11/02/writing-a-token-n-grams-analyzer-in-few-lines-of-code-using-lucene/ – mindas Jul 15 '14 at 13:20

1 Answers1

0

Build you own analyzer using a ShingleMatrixFilter with the parameters that suits you needs. For instance, here the few lines of code to build a token bi-grams analyzer:

public class NGramAnalyzer extends Analyzer {
    @Override
    public TokenStream tokenStream(String fieldName, Reader reader) {
       return new StopFilter(new LowerCaseFilter(new ShingleMatrixFilter(new StandardTokenizer(reader),2,2,' ')),
           StopAnalyzer.ENGLISH_STOP_WORDS);
     }
}

Source

Rishi Dua
  • 2,296
  • 2
  • 24
  • 35