i have requirement to incorporate N Gram in my search engine and am using lucene 4.4 as my search engine. basically am finding some hard time to learn NGram, could some one help me out by showing some simple steps?
thanks in advance!!
i have requirement to incorporate N Gram in my search engine and am using lucene 4.4 as my search engine. basically am finding some hard time to learn NGram, could some one help me out by showing some simple steps?
thanks in advance!!
Build you own analyzer using a ShingleMatrixFilter with the parameters that suits you needs. For instance, here the few lines of code to build a token bi-grams analyzer:
public class NGramAnalyzer extends Analyzer {
@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new StopFilter(new LowerCaseFilter(new ShingleMatrixFilter(new StandardTokenizer(reader),2,2,' ')),
StopAnalyzer.ENGLISH_STOP_WORDS);
}
}