0

i am facing the problem of text search through a large amount of Arabic content documents (PDF and Doc files) in C# .Net.

after a lot and a lot searching, i came up with 2 solutions,

First, Lucene.Net and i faced the following issues

1- Arabic analyzer to be used with Lucene.Net and found this, don know yet if it will be working !

2- Extracting the text from the documents (about 6000 PDF and Doc files) and found Tika which i will be using in .Net with the help of ikvm. However, given that this solution will work, i don know the performance will be.

Second, Xapian and i moved to this solution in-order to make use of omega library, but still found some issues

1- Will xapian work with Arabic context or it will be needing an Arabic analyzer too and if so how will i work this problem around

indeed, i cant decide which solution to go with regarding Arabic content and an almost large amount of data.

Any help or suggestion is very appreciated,

Thanks,

Samer

Community
  • 1
  • 1
Samer Makary
  • 1,815
  • 2
  • 22
  • 25

1 Answers1

0

If you want to use nLucene you have to create arabic analyzer, but Im using Solr and its working fine with Arabic language. Check this topic

Peyman
  • 3,068
  • 1
  • 18
  • 32