0

I have an excel file I am reading in python using the xlrd module. I am extracting the values from each row, adding some additional data and writing it all out to a new text file. However I am running into an issue with cells that contain text with the fraction 3/4. Python reads the value as \xbe, and each time I encounter it, I get this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xbe' in position 317: ordinal not in range(128)

I am converting my list of values from each row into a string, have tried the following without success:

row_vals_str = [unicode(str(val), 'utf-8') for val in row_vals]
row_vals_str = [str(val).encode('utf-8') for val in row_vals]
row_vals_str = [str(val).decode() for val in row_vals]

Each time I hit the first occurrence of the 3/4 fraction I get the same error.

How can I convert this to something that can be written to text?

I came across this thread but didn't find an answer: How to convert \xXY encoded characters to UTF-8 in Python?

Community
  • 1
  • 1
kflaw
  • 424
  • 1
  • 10
  • 26

2 Answers2

1

It is latin-1 group. you can use latin1 to decode the char or replace to different one if you do not need it.

http://www.codetable.net/hex/be

>>> '\xbe'.decode('latin1')
u'\xbe'
>>> '\xbe'.decode('cp1252')
u'\xbe'


>>> '\xbe this is a test'.replace('\xbe','3/4')
'3/4 this is a test'
galaxyan
  • 5,944
  • 2
  • 19
  • 43
  • Thanks for the reply! I tried your first and 3rd options and still got the same error? although in the first case i applied it to the entire string that contained \xbe. – kflaw Nov 14 '16 at 22:29
  • @kflaw if the first decode works you could decode first then replace 'xbe' to '3/4' – galaxyan Nov 14 '16 at 22:33
  • @kflaw because of the sloppy Unicode handling that Python 2 encouraged, it might be an error within `xlrd` that you can't fix without patching it (have you read their documentation about opening files with various encodings?). You could also try a different package `openpyxl`. – Nick T Nov 14 '16 at 22:45
  • @NickT thanks for the reply I think you are right about it being an issue with xlrd I will look into it – kflaw Nov 14 '16 at 22:48
0

What actually ending up working was to to decode the string, then encode it, then replace:

row_vals_str = [str(val).decode('latin1').encode('utf8').replace(r'\xbe', '3/4') for val in row_vals]
kflaw
  • 424
  • 1
  • 10
  • 26