1

I'm parsing a CSV as follows:

with open(args.csv, 'rU') as csvfile:
        try:
            reader = csv.DictReader(csvfile, dialect=csv.QUOTE_NONE)
            for row in reader:
            ...

where args.csv is the name of my file. One of the rows in my file is an e with two dots on top. My script breaks when it encounters this.

I get the following stack trace:

File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 244, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)

and the following error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x91 in position 5: invalid start byte

FWIW, I'm running Python 2.7 and upgrading isn't an option (for a few reasons).

I'm pretty lost about how to fix this so any help is much appreciated.

Thanks!

anon_swe
  • 8,791
  • 24
  • 85
  • 145

1 Answers1

10

Byte 0x91 is a "smart" opening single quote in Windows-1252 encoding. So it sounds like that's the encoding your file is using, not UTF-8. So, use open(args.csv, 'rU', encoding='windows-1252').

C. K. Young
  • 219,335
  • 46
  • 382
  • 435
  • When I follow your answer, I get: "TypeError: 'encoding' is an invalid keyword argument for this function". Fwiw, I'm running Python 2.7 and (for a few reasons) can't change that. – anon_swe Jun 24 '16 at 18:04
  • 3
    @bclayman It is preferable that you mention that in your question, even though it is mentioned in the stacktrace. – DeepSpace Jun 24 '16 at 18:07
  • 1
    Great answer! I managed to convert a file in Uzbek language to UTF-8 `iconv -t UTF-8 -f Windows-1252 in.xml` I would've spent a lot of time guessing what 0x91 and 0x92 character mean. – Boris Treukhov Feb 25 '18 at 18:46