0

I am trying to implement a custom solr filter to stem arabic word, the filter class is as follow but i keep getting the following error "possible analysis error" when indexing the document, i am using Khoja's stemmer

public final class CustomArbicStemFilter extends TokenFilter {
private CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
private CustomArabicStemmer stemmer = null; 
public CustomArbicStemFilter(TokenStream input) {
super(input);
this.stemmer = new CustomArabicStemmer();
}
public final boolean incrementToken() throws IOException {
     if (input.incrementToken()) {      
         char termBuffer[] = termAtt.buffer();
         String currentWord = new String( termBuffer);
         String stemmedWord = stemmer.stemWord(currentWord);
         char finalTerm[] = stemmedWord.toCharArray();
         termAtt.copyBuffer(finalTerm, 0, finalTerm.length);      
         return true;
     }else{
         return false;
     }      

} }

Moon123
  • 153
  • 3
  • 15
  • what are the fieldtypes in question? I've generally seen this when trying to index string field that are longer than about 32k characters – Binoy Dalal Jun 28 '16 at 05:52
  • i used the general text_ar field type that come with default schema but updated the analyzer tag to include my custom filter, belwo is the config i have in the solr schema – Moon123 Jun 28 '16 at 07:07
  • Have you checked the stack trace? It should point you to the line in your code where the exception might be thrown. Also post it here. – Binoy Dalal Jun 28 '16 at 10:22

0 Answers0