Datacompy Library in Python writes Arabic letters as gibberish in the report

Question

I am using Datacompy Library in Python to compare two dataframes that have Arabic data, the data gets encoded safely and successfully using "cp1256" encoder and Python displays the Arabic letters correctly on the console but when Datacompy starts comparing the data and then produces the comparison report, the Arabic letters get converted to something like ÃÃ¡Ãš and ÃÃ‘Ã¥Ã”Ã¦Ãˆ in the Report.txt file.

Can I fix this problem of writing Arabic letters the wrong way in the file? Any help would be appreciated. Thanks

You are presumably opening that text file with a text editor. Does it know that the text encoding is cp1256? Using codepages is old technology -- do you have good reasons to not use the more universal (and default) UTF8? — Jongware, Jul 29 '20 at 13:39
@usr2564301 when i use UTF-8 the comparison script doesnt recognize all of the characters in Arabic, thats why i am using cp1256. The output is written to a basic .txt file, is there a way let the output destination know what is the unicode being used? — Zeyad Al Mothafar, Jul 29 '20 at 16:09
No, that's not an option for plain text files; a text *viewer* must be told what code page to use. It *is* pretty much the single reason UTF8 was designed. That said: your toolchain appears to work correctly, it's just that the very editor you use to read the output fails. — Jongware, Jul 29 '20 at 19:53

Datacompy Library in Python writes Arabic letters as gibberish in the report

0 Answers0