0

I'm using nutch with Solr for a developing a search engine for Arabic texts. I need to implement a stemmer on my Arabic texts, and while serching on Solr Stemmer I found that it provide those two filters

<filter class="solr.ArabicNormalizationFilterFactory"/>

<filter class="solr.ArabicStemFilterFactory"/>

I tried them but did not understand what they do .. So please any one can help me with some examples??

and do these two do this:

العملات Stemmed to عملة

البسَاتِين ، بساتينكم Stemmed to بستان

thank you.

Akram
  • 7,548
  • 8
  • 45
  • 72
sakurami
  • 343
  • 3
  • 18

1 Answers1

1

You can find some details here: http://lucene.apache.org/core/3_6_0/api/contrib-analyzers/org/apache/lucene/analysis/ar/ArabicStemmer.html

That says:

Stemming is defined as:

  • Removal of attached definite article, conjunction, and prepositions.
  • Stemming of common suffixes.
Walter Underwood
  • 1,201
  • 9
  • 11
  • Thank you Walter, It seem that it do something near of the thing I need. If I want to update the stemmer or add my own one, where do you suggest to add my code in? and if I add the stemmer to the content for example and then search for a keyword let say "عملة" will the result contain documents with "عملة" and "عملات" by default or do I have to do extra configuration?? ..... thank you again. – sakurami May 22 '12 at 05:01
  • If both of those are converted to the same stem by ArabicStemmer, then they will match. Solr will do the same conversions for indexing and for querying. – Walter Underwood May 29 '12 at 05:43