5

I have Arabic PDF Files and it seems that there are something wrong in its encoding .

When I try to search in the PDF for word inside it , it didn't find results

when I try to export the pdf contents to Excel using other programs it export data in a strange encoding

When I copy the data in the PDF to notepad , Notepad display strange encoding.

I am developing solution which will use these PDFs (about 950 file) so I must found a way to fix encoding.

Thanks in Advance

M_1100
  • 67
  • 1
  • 1
  • 7

1 Answers1

1

Disclaimer: I've never edited an Arabic file.

How did you export the .pdf contents to Excel?

You cannot directly open a .pdf file neither with Word/Excel/Wordpad nor Notepad, that strange encoding you're seeing most probably is the specific encoding of a selected font resource.

You can use this this tool to detect the encoding

but I really advise you to read the bare minimum about Unicode and Character Sets

From then on, considering the amount of files involved, a good solution seems to be PyODConverter

For a smaller amount of files, Free PDF to Word Converter will take care of your needs:

Joao Figueiredo
  • 3,120
  • 3
  • 31
  • 40
  • Dear Joao ,My main problem is to fix PDF file , when I open it in any PDF reader I can read it easily but when I search for any word I see it tells me "no results found" – M_1100 Nov 21 '11 at 17:56
  • 1
    But did you already confirmed what encoding those .pdf are using?Check this question, maybe it'll put you on track: http://superuser.com/questions/119393/search-pdfs-with-non-standard-character-encodings – Joao Figueiredo Nov 21 '11 at 18:10
  • yes this is exactly my situation , Thanks – M_1100 Nov 21 '11 at 19:08