1

I am trying to read this file using pandas in UTF-8 encoding.

English alphabetic characters are read properly but those characters which are not English alphabet are not read properly. I tried reading by changing encoding from utf8 to cp1252, ASCII but nothing worked for me.

For more detail see the image. UTF8 encoded

ascii encoded

NoobN3rd
  • 1,223
  • 1
  • 9
  • 19

1 Answers1

0
>>> import pandas as pd
>>> file = "D:\\Python\\SO3\\data\\62015078.xlsx"
>>> data = pd.read_excel(file,encoding='utf8')
>>> data.en
0     Release note
1    Sales package
2        Schematic
3         Software
4        Statistic
5            Video
Name: en, dtype: object
>>> data.ja
0    リリースノート
1    販売パッケージ
2        回路図
3     ソフトウェア
4         統計
5         動画
Name: ja, dtype: object
>>> data.zh
0    版本说明
1     销售包
2     示意图
3      软件
4      统计
5      视频
Name: zh, dtype: object

The code snippet works. The character is Replacement Character (U+FFFD). You need to set proper console / terminal font:

output with proper terminal font

The same window with a common console font:

output with improper terminal font

JosefZ
  • 28,460
  • 5
  • 44
  • 83
  • It appears that `pd.read_excel` works independently on supplied `encoding`, for instance using `data = pd.read_excel(file,encoding='none')` or even `data = pd.read_excel(file)`. – JosefZ May 30 '20 at 19:21