0

We have a build of solr currently working with only English we need to add arabic support to it there is not much detail in Solr Wiki about how to start with

These are the following steps ive did

Added the following to schema.xml

<fieldType name="text_general_arabic" class="solr.TextField"     positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>    
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ArabicNormalizationFilterFactory"/>
<filter class="solr.ArabicStemFilterFactory"/>    
</analyzer>
</fieldType>

Defined a field in Schema.xml

<field name="البرتغالية" type="text_general_arabic" indexed="true" stored="true"/>

FYI ive copied the Arabic text from google translate in the browser and pasted it

Later i have created a csv file using notepad as unicode file and saved it as Arabic.csv and it has its field name as

البرتغالية

When i try to index the file using the following cURL command

D:\>curl http://localhost:8080/solr/coll9/update/csv -F "stream.file=D:\Arabic.csv" -F   "commit=true" -F "optimize=true"
-F "encapsulate="" -F "keepEmpty=true"

im getting an undefined field error i dont know where am I doing wrong

UPDATE: When i try the same thing with an XML file instead of a csv file it is working

Mitra
  • 154
  • 1
  • 1
  • 10

1 Answers1

0

So first, I would recommend changing all your field names to be in English if possible. It avoids some confusion. You might also consider following the advice in this answer regarding field naming for the same data in different languages.

The CSVLoaderBase::load() function uses the Java BufferedLoader() class under the covers and doesn't specify an encoding. I'm guessing the default encoding is probably not compatible with Arabic as noted in this question.

In Solr 4.0 the schema.xml comes with predefined field types for each language. More language-specific info is here. I think all these filters are also available in 3.6. The Solr4 schema.xml example is here.

Community
  • 1
  • 1
saarp
  • 1,931
  • 1
  • 15
  • 28
  • Hey im using Solr 3.6 and can you please elaborate on what you are trying to explain and it works fine for me when using a XML file instead of csv – Mitra Dec 21 '12 at 09:31
  • I can index arabic text using the predefined field types as you suggested, however, the filters for normalization are not working. I have a separate [question here](http://stackoverflow.com/questions/27485205/arabic-normaliztion-in-solr) any idea, what can be missing? – MoustafaAAtta Dec 15 '14 at 13:39