How can I encode/decode \xbe in python?

Question

I have an excel file I am reading in python using the xlrd module. I am extracting the values from each row, adding some additional data and writing it all out to a new text file. However I am running into an issue with cells that contain text with the fraction 3/4. Python reads the value as \xbe, and each time I encounter it, I get this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xbe' in position 317: ordinal not in range(128)

I am converting my list of values from each row into a string, have tried the following without success:

row_vals_str = [unicode(str(val), 'utf-8') for val in row_vals]
row_vals_str = [str(val).encode('utf-8') for val in row_vals]
row_vals_str = [str(val).decode() for val in row_vals]

Each time I hit the first occurrence of the 3/4 fraction I get the same error.

How can I convert this to something that can be written to text?

I came across this thread but didn't find an answer: How to convert \xXY encoded characters to UTF-8 in Python?

It's probably encoded with the equivalent of latin-1. However, you'll still have to do something useful with the character. — John Szakmeister, Nov 14 '16 at 21:50
If you're having Unicode problems, it's better to use Python 3. — Nick T, Nov 14 '16 at 22:17
thanks for the replies, @NickT I'm unfortunately stuck using python 2.7 at the moment — kflaw, Nov 14 '16 at 22:28

score 1 · Answer 1 · answered Nov 14 '16 at 22:14

1

It is latin-1 group. you can use latin1 to decode the char or replace to different one if you do not need it.

http://www.codetable.net/hex/be

>>> '\xbe'.decode('latin1')
u'\xbe'
>>> '\xbe'.decode('cp1252')
u'\xbe'


>>> '\xbe this is a test'.replace('\xbe','3/4')
'3/4 this is a test'

answered Nov 14 '16 at 22:14

galaxyan

5,944
2
19
43

Thanks for the reply! I tried your first and 3rd options and still got the same error? although in the first case i applied it to the entire string that contained \xbe. – kflaw Nov 14 '16 at 22:29
@kflaw if the first decode works you could decode first then replace 'xbe' to '3/4' – galaxyan Nov 14 '16 at 22:33
@kflaw because of the sloppy Unicode handling that Python 2 encouraged, it might be an error within `xlrd` that you can't fix without patching it (have you read their documentation about opening files with various encodings?). You could also try a different package `openpyxl`. – Nick T Nov 14 '16 at 22:45
@NickT thanks for the reply I think you are right about it being an issue with xlrd I will look into it – kflaw Nov 14 '16 at 22:48

score 0 · Answer 2 · answered Nov 15 '16 at 17:28

0

What actually ending up working was to to decode the string, then encode it, then replace:

row_vals_str = [str(val).decode('latin1').encode('utf8').replace(r'\xbe', '3/4') for val in row_vals]

answered Nov 15 '16 at 17:28

kflaw

424
1
10
26

How can I encode/decode \xbe in python?

2 Answers2