18

I'm trying to import a CSV, using this code:

    import csv
    import sys

    def load_csv(filename):
        # Open file for reading
        file = open(filename, 'r')

        # Read in file
        return csv.reader(file, delimiter=',', quotechar='\n')

    def main(argv):
        csv_file = load_csv("myfile.csv")

        for item in csv_file:
            print(item)

    if __name__ == "__main__":
        main(sys.argv[1:])

Here's a sample of my csv file:

    foo,bar,test,1,2
    this,wont,work,because,α

And the error:

    Traceback (most recent call last):
      File "test.py", line 22, in <module>
        main(sys.argv[1:])
      File "test.py", line 18, in main
        for item in csv_file:
      File "/usr/lib/python3.2/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 40: ordinal not in range(128)

Obviously, It's hitting the character at the end of the CSV and throwing that error, but I'm at a loss as to how to fix this. Any help?

This is:

    Python 3.2.3 (default, Apr 23 2012, 23:35:30)
    [GCC 4.7.0 20120414 (prerelease)] on linux2
Ryan Rapini
  • 365
  • 1
  • 3
  • 11

3 Answers3

22

It seems your problem boils down to:

print("α")

You could fix it by specifying PYTHONIOENCODING:

$ PYTHONIOENCODING=utf-8 python3 test.py > output.txt

Note:

$ python3 test.py 

should work as is if your terminal configuration supports it, where test.py:

import csv

with open('myfile.csv', newline='', encoding='utf-8') as file:
    for row in csv.reader(file):
        print(row)

If open() has no encoding parameter above then you'll get UnicodeDecodeError with LC_ALL=C.

Also with LC_ALL=C you'll get UnicodeEncodeError even if there is no redirection i.e., PYTHONIOENCODING is necessary in this case (before PEP 538: Legacy C Locale Coercion implemented in Python 3.7+).

jfs
  • 399,953
  • 195
  • 994
  • 1,670
13

From the python docs, you have to set the encoding for the file. Here is an example from the site:

import csv

 with open('some.csv', newline='', encoding='utf-8') as f:
   reader = csv.reader(f)
   for row in reader:
     print(row)

Edit: Your problem appears to happen with printing. Try using pretty printer:

import csv
import pprint

with open('some.csv', newline='', encoding='utf-8') as f:
  reader = csv.reader(f)
  for row in reader:
    pprint.pprint(row)
TheDude
  • 3,796
  • 2
  • 28
  • 51
  • 3
    Setting the encoding for the file does nothing to fix the issue... `file = open(filename, 'r', encoding='utf-8')` still gives me `UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 40: ordinal not in range(128)` – Ryan Rapini Oct 05 '12 at 19:09
  • Ah, it has to do with `print` not being able to display unicode characters. This question on Quora may have the answer -- it uses pretty printer: http://www.quora.com/How-do-you-print-a-python-unicode-data-structure – TheDude Oct 05 '12 at 19:21
  • 1
    I think the error has nothing to do with the print at all. It's hitting the error at the beginning of the for loop, before the print() even runs. Your edited sample code using pprint yields the same error as before, further reinforcing this claim. I'm stumped. – Ryan Rapini Oct 05 '12 at 19:36
  • 2
    `export PYTHONIOENCODING=utf-8` fixed my issue. – Ryan Rapini Oct 05 '12 at 19:46
  • @betaRepeating "export PYTHONIOENCODING=utf-8 fixed my issue." could you explain further? – Inês Martins Aug 18 '16 at 14:27
5

Another option is to cover up the errors by passing an error handler:

with open('some.csv', newline='', errors='replace') as f:
   reader = csv.reader(f)
   for row in reader:
    print(row)

which will replace any undecodable bytes in the file with a "missing character".