4

I generated a csv via excel and when printing the key names, I get some weird characters appended to the first key like so:

keys(['row1', 'row2']

import csv

path = 'C:\\Users\\asdf\\Desktop\\file.csv'
with open(path, 'r') as file:
    reader = csv.DictReader(file)

    for row in reader:
        print(row.keys())

However, if I just create the csv in the IDE everything works fine and no strange chars are printed. How can I read the excel csv in to chop off the strange characters?

displayName
  • 195
  • 2
  • 15
  • 4
    Probably a UTF byte order mark. Use "utf-8-sig" as file encoding in "open". – Michael Butscher Apr 13 '20 at 21:38
  • @MichaelButscher hello, why is it happening though ? The python documentation says : > On encoding, a UTF-8 encoded BOM will be prepended to the UTF-8 encoded bytes. For the stateful encoder this is **only done once** (on the first write to the byte stream). https://docs.python.org/3/library/codecs.html#module-encodings.utf_8_sig I would be expecting the strange characters to appear only once en before the start of the first row. – Poutrathor Jan 22 '23 at 20:25
  • @Poutrathor Usually it only appears at the beginning of the file but this depends on the way the encoder is used. The BOM appears as part of the first row (not before it) because it doesn't contain a newline and the CSV reader sees it as characters belonging to the first entry. Here the first row contains column titles which are repeated as dict keys by the DictReader although they are present in the file only once. – Michael Butscher Jan 22 '23 at 21:42

1 Answers1

5
with open(path, 'r', encoding='utf-8-sig')

this worked

displayName
  • 195
  • 2
  • 15