Need explanation on Language Stemmer of Solr

Question

I'm using nutch with Solr for a developing a search engine for Arabic texts. I need to implement a stemmer on my Arabic texts, and while serching on Solr Stemmer I found that it provide those two filters

<filter class="solr.ArabicNormalizationFilterFactory"/>

<filter class="solr.ArabicStemFilterFactory"/>

I tried them but did not understand what they do .. So please any one can help me with some examples??

and do these two do this:

العملات Stemmed to عملة

البسَاتِين ، بساتينكم Stemmed to بستان

thank you.

score 1 · Accepted Answer · answered May 22 '12 at 00:00

1

You can find some details here: http://lucene.apache.org/core/3_6_0/api/contrib-analyzers/org/apache/lucene/analysis/ar/ArabicStemmer.html

That says:

Stemming is defined as:

Removal of attached definite article, conjunction, and prepositions.
Stemming of common suffixes.

answered May 22 '12 at 00:00

Walter Underwood

1,201
9
11

Thank you Walter, It seem that it do something near of the thing I need. If I want to update the stemmer or add my own one, where do you suggest to add my code in? and if I add the stemmer to the content for example and then search for a keyword let say "عملة" will the result contain documents with "عملة" and "عملات" by default or do I have to do extra configuration?? ..... thank you again. – sakurami May 22 '12 at 05:01
If both of those are converted to the same stem by ArabicStemmer, then they will match. Solr will do the same conversions for indexing and for querying. – Walter Underwood May 29 '12 at 05:43

Need explanation on Language Stemmer of Solr

1 Answers1

Linked