I am using solr to implement search for an Arabic site, and i want normalize plural words into singular words and visa versa, so searching for "كتاب" gets me any document that contains "كتاب" or "كتب" is this possible in solr, highly appreciate your input
Asked
Active
Viewed 191 times
1
-
Probably helpful http://stackoverflow.com/questions/10681281/need-explanation-on-language-stemmer-of-solr – cheffe Jun 20 '16 at 05:33
-
There is [a package in Lucene](https://lucene.apache.org/core/6_0_0/analyzers-common/index.html?org/apache/lucene/analysis/ar/ArabicNormalizationFilter.html) (and therefore available in Solr) that is dedicated to the Arabic language. Unfortunately I know not enough about Arabic to tell you, if this is what you need. – cheffe Jun 20 '16 at 05:35
-
Thanks, i used it but with no luck – Moon123 Jun 20 '16 at 08:33
2 Answers
1
There was a presentation at the Lucene/Solr Revolution by Ramzi Alqrainy about Solr's support for Arabic and common issues. It is now available online.

Alexandre Rafalovitch
- 9,709
- 1
- 24
- 27
1
You need a stemmer to bring words into their origins in both indexing/search, try Khoja's stemmer or Assem's stemmer
Solr by default is using a light stemmer for Arabic but seems you needs a deep stemmer that brings roots of the word.

Assem
- 11,574
- 5
- 59
- 97
-
Thanks a lot, i already tried to create my own custom Arabic stem filter, but do not know what the correct implementation of the incrementToken() method would be, highly appreciate your input.. – Moon123 Jun 26 '16 at 11:25