0

I've built my list that contains some Arabic words, then add a record related to this list in the def file and put the two files in the same directory then in my java code I've written:

FeatureMap params = Factory.newFeatureMap();
params.put("encoding", "UTF-8");       
params.put("listsURL","file:/D:/ThesisProj/Gazetteers/lists.def");
LanguageAnalyser gazetteer = (LanguageAnalyser)Factory.createResource("arabic.ArabicGazetteer",params);
gazetteer.init();

when the list and the file- I match words from it- containing English words, the matching is done as the resulted annotations have lookup annotation with the matched words, but when I try to use Arabic language and have just Arabic words in both list and comparing file there is no lookup annotation in the resulted annotations, can any one help me to make GATE recognize Arabic character and match them, I think utf-8 not suitable

dangee1705
  • 3,445
  • 1
  • 21
  • 40
Suzn CB
  • 1
  • 4

1 Answers1

0

It could be character encoding issue. You may created the list that contains some Arabic words using different encoding than utf-8...

Also check the encoding of the documents, it may be broken as well..

GATE is definitely capable of handling Arabic language. You can easily verify if everything is ok in the GUI.

See two simple screenshots created with the GATE plugin Language: Arabic

Check if the gazetteer list looks ok:

arabic gazetteer

Check if the document looks ok:

arabic document

dedek
  • 7,981
  • 3
  • 38
  • 68
  • Thanks for reply. I try them in GUI but it doesn't display them in a correct format , it displays strange charactets. So how can I check the encoding for both gazetteer and document cause I set utf-8 as a parameter in gazetter and the document is a txt file – Suzn CB Jul 31 '19 at 21:24
  • You need to use the same encoding in you text editor and in GATE. What is your text editor you use to edit your *lists* and documents? What encoding is it using? – dedek Aug 01 '19 at 07:00