2

Trying to read from a CSV file and write the data into an XML file. I am encountering:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8a in position 87: ordinal not in range(128)

My question is, what is the best way to ignore this kind of error and continue processing the data set. After reading other similar questions, I did add: # -*- coding: utf-8 -*- to my file but it didn't help

user1195192
  • 679
  • 3
  • 11
  • 19
  • Properly decode the input, e.g. read as bytes and then do `input.decode("utf-8")` (if your input is utf-8). – syntonym Sep 08 '16 at 15:01

2 Answers2

1

You can try opening csv with codecs:

import codecs
codecs.open(file_name, 'r', 'utf8')

Given that each line will contain '\n' string you will need to apply line.rstrip() when looping trough lines.

Note: Please don't try to convert values to str as you will encounter another error there.

zipa
  • 27,316
  • 6
  • 40
  • 58
  • Thanks @Boris, going to try your suggestion – user1195192 Sep 08 '16 at 15:35
  • Please read the edit, that '\n' gave me headaches more than once :) – zipa Sep 08 '16 at 15:37
  • This is how I am reading at the moment: with open('myFile.csv', 'rb') as ifile: reader = csv.reader(ifile) for rownum, row in enumerate(reader): – user1195192 Sep 08 '16 at 15:46
  • You can first replace **open** inside your with statement with **codecs.open** as i suggested in answer. Then just use for rownum, row in enumerate(ifile): row = row.rstrip() on the first line of your iteration and it should replace your csv.reader() method. – zipa Sep 08 '16 at 15:53
1

I was getting this error while reading readme ad long description in setup.py. If you are using open, you can use the encoding parameter:

with open("README.md", "r", encoding='utf_8') as f:
    long_description = f.read()
Shital Shah
  • 63,284
  • 17
  • 238
  • 185