I can't correctly read excel data with accented characters with pandas.
data = pd.read_excel("C:/Users/XXX/Desktop/Help_me_plz.xlsx", encoding='utf-8')
This what I obtain:
ID Titre Entité
0 2020044459 SOAPPRO - Problème ouverture documents Root entity > Utilisateurs
1 2020048819 Probleme de conformité Smartphone KMSE Root entity > Utilisateurs
As you can see accent are not correctly interpreted and appeared as weird characters.
I searched on the Internet and tried several things:
Convert the files in csv
Convert file in various encoding type
Open the the file with notepad but the problem is still here
I even tried to use the following code which return wrong output:
from unidecode import unidecode print(unidecode('Entité'))
I was expecting Entité
but it gave me the following output: EntitA(c)
.
Is there a way to interpret correctly accent or identify the right encoding to use?