Xapian vs Lucene.Net - Arabic documents text search

Question

i am facing the problem of text search through a large amount of Arabic content documents (PDF and Doc files) in C# .Net.

after a lot and a lot searching, i came up with 2 solutions,

First, Lucene.Net and i faced the following issues

1- Arabic analyzer to be used with Lucene.Net and found this, don know yet if it will be working !

2- Extracting the text from the documents (about 6000 PDF and Doc files) and found Tika which i will be using in .Net with the help of ikvm. However, given that this solution will work, i don know the performance will be.

Second, Xapian and i moved to this solution in-order to make use of omega library, but still found some issues

1- Will xapian work with Arabic context or it will be needing an Arabic analyzer too and if so how will i work this problem around

indeed, i cant decide which solution to go with regarding Arabic content and an almost large amount of data.

Any help or suggestion is very appreciated,

Thanks,

Samer

score 0 · Answer 1 · answered Jul 14 '11 at 07:01

0

If you want to use nLucene you have to create arabic analyzer, but Im using Solr and its working fine with Arabic language. Check this topic

answered Jul 14 '11 at 07:01

Peyman

3,068
1
18
32

so can Solr do the 3 step, extracting text then indexing then searching ? – Samer Makary Jul 14 '11 at 10:54

Xapian vs Lucene.Net - Arabic documents text search

1 Answers1