KeyError, related to ANSI vs UTF-8 Codec?

Question

I'm new to python and programming, so I'm sure my code is the opposite of 'pythonic'.

My goal is to grab an excel file from a server, open it, write the data in the file to a new file. Next I open that file and use dictReader to print a few columns from the file. First I wrote a script to just open an excel file and print the columns. I was able to achieve this. Now I'm adding the next step of grabbing the file, writing it to a local file and then opening it.

I'm receiving the following error which I've researched a fair amount prior to posting:

  File "request.py", line 20, in <module>
    if line['Change'] == ticket_ID:
KeyError: 'Change'

Below is my code:

import csv
import sys
import urllib2

data = urllib2.urlopen("URL-HERE")

new_file = open("c:\\file-here", "w")
for fields in data:
    new_file.write(fields)
new_file.close()

test_file = 'getrequest6.csv'

csv_file = csv.DictReader(open(test_file, 'r'))

# ticket_ID = sys.argv[1]
ticket_ID = "RMSDB00010243"

for line in csv_file:
    if line['Change'] == ticket_ID:
        change_col = line['Change']
        review_col = line['Review']
        phase_col = line['Phase']
        print change_col, review_col, phase_col

What I've found is that the file I manually download from the server is of ANSI codec. This file works when I open it. The new file I create and write (getrequest6) is a UTF-8 codec. I'm assuming that is the reason for the KeyError because when I copy and paste the data from the getrequest6 file, to the ANSI file, my code above works.

Any help would be appreciated. I searched on the codecs class but I do not understand it.

No, the key `'Change'` will automatically be decoded to Unicode (using `'ASCII'` as the default encoding) as needed to compare against unicode values. — Martijn Pieters, Feb 27 '13 at 21:11
Do `print csv_file.next()` to get an idea of what keys *are* present in the file. — Martijn Pieters, Feb 27 '13 at 21:11
To save url to a local file, try: `import urllib; urllib.urlretrieve("URL_HERE", r"C:\file-here")` — jfs, Feb 27 '13 at 21:16
interesting, when I do 'print csv_file.next()' it prints my third column first, then it prints the first column with this: "\xef\xbb\xbfChange", when the column name is "Change"? — RJP, Feb 27 '13 at 21:16
@J.F.Sebastian, thanks, I tried that, and it seems like a much better way than opening and writing the file! — RJP, Feb 27 '13 at 21:21

score 1 · Answer 1 · answered Feb 27 '13 at 21:15

1

If the problem lies in the source csv files that some lines just lack 'Change', then what you can do (also a good practice in general) is to check the availability of that key in the dict:

# ...
if 'Change' in line and line['Change'] == ticket_id:
    # the rest

answered Feb 27 '13 at 21:15

woozyking

4,880
1
23
29

3

you could check it as: `if line.get('Change') == ticket_id:`. `.get()` returns `None` if there is no value for `'Change'` key. – jfs Feb 27 '13 at 21:17
when I do that it returns none. So that means there is no value for the 'Change' key? – RJP Feb 27 '13 at 21:28
@RJP with `Change in line` you can check whether the key exists or not in line; with `line.get('Change') -> None`, you either have the possibility of key 'Change' does not exist and **default return** of `dict.get()` is None, or that the key does in fact exist and the value of that key is None (not likely, but possible from a pure Python perspective). Now to avoid this, you can do `line.get('Change', "KEY_NOT_EXIST")` which will return "KEY_NOT_EXIST" or anything you'd like to define if the key is not in the dict, to rule out one of the two possibilities aforementioned. – woozyking Feb 27 '13 at 21:32
@woozyking: But you already know `None == ticket_id` is false, so there's no reason to treat non-existent and `None` as different in this case. – abarnert Feb 27 '13 at 21:39
@woozyking I have the following: `for line in csv_file: if line.get('Change', "KEY_NOT_EXIST") == ticket_ID: change_col = line['Change'] review_col = line['Review'] phase_col = line['Phase'] print change_col, review_col, phase_col` – RJP Feb 27 '13 at 21:40
@woozyking, so I tried printing every other column in the excel file, and every Key works except for the "Change" one which is the first one? I wonder why that is? – RJP Feb 27 '13 at 21:59
I think the problem is indeed related to BOM that @J.F.Sebastian mentioned. Try [this](http://stackoverflow.com/a/904085/158111) or else a lazier way try line.get("\xef\xbb\xbfChange") or line.get(u"\xef\xbb\xbfChange") depends on the key type. – woozyking Feb 27 '13 at 22:21

jfs · Answer 2 · 2013-03-01T07:01:32.723

1

it prints the first column with this: "\xef\xbb\xbfChange", when the column name is "Change"

b"\xef\xbb\xbf" is BOM in utf-8.

The csv module doesn’t directly support reading and writing Unicode. But if you remove the BOM then it should be OK to use csv module with data encoded using utf-8:

import os
import shutil
from codecs import BOM_UTF8

with open(test_file, 'rb') as file, open(test_file+".utf8", 'wb') as outfile:
     chunk = file.read(len(BOM_UTF8))
     if chunk != BOM_UTF8: # don't write BOM
        outfile.write(chunk)
     shutil.copyfileobj(file, outfile)
os.remove(test_file)
os.rename(test_file+".utf8", test_file)

edited Mar 01 '13 at 07:01

answered Feb 27 '13 at 21:49

jfs

399,953
195
994
1,670

when I import shutil I get `AttributeError: module' object has no attribute 'copy2'` – RJP Feb 27 '13 at 22:10
@RJP: check that you don't have Python files in the current directory that have the same names as stdlib modules e.g., there should be no `copy.py`, `shutil.py` in cwd. Otherwise they are used instead of standard modules. – jfs Feb 27 '13 at 22:36
this worked, thanks a lot! I'll read up on BOM as I've never heard of that – RJP Mar 07 '13 at 15:57

KeyError, related to ANSI vs UTF-8 Codec?

2 Answers2