0

The following script is erroring out:

import csv,time,string,os,requests, datetime

test = "\\\\network\\Shared\\test.csv"

fields = ["id", "Expiration Date", "Cost", "Resale" ]

with open(test) as infile, open("c:\\upload\\tested.csv", "wb") as outfile:
    r = csv.DictReader(infile)
    w = csv.DictWriter(outfile, fields, extrasaction="ignore")
    r = (dict((k, v.strip()) for k, v in row.items() if v) for row in r)

    wtr = csv.writer( outfile )                                                                                                                                                                                                                                                                                                                                         
    wtr.writerow(["id", "upload_date", "cost", "resale"])
    for i, row in enumerate(r, start=1):
        row['id'] = i
        print(row['Expiration Date']
        row['Expiration Date'] = datetime.datetime.strptime(row['Expiration Date'][:10], "%m/%d/%Y").strftime("%Y-%m-%d")
        w.writerow(row)


D:\Python\Scripts>python test.py
Traceback (most recent call last):
  File "test.py", line 18, in <module>
    print(row['Expiration Date'])
KeyError: 'Expiration Date'

So I think I understand what's going on - something like this from the original file:

Expiration Date     Cost     Resale
2016-01-01          1.00     2.00
                    1.42     2.42
2016-05-02          1.45     9.00

From what I can gather, there is a row where the expiration date column is NOT populated. How do I force DictWriter to skip over blanks - assuming that is the cause of my error?

Brian Powell
  • 3,336
  • 4
  • 34
  • 60

2 Answers2

1

Actually, the dict produced by the csv.DictReader just puts None into a field it does not find and thus you should not get that error. You are not using the functionality of the DictReader to produce a proper dict! As far as I can tell, you try to do the parsing yourself by use of the line r = (dict((k, v.strip()) for k, v in row.items() if v) for row in r). That does not actually work, though. If you print the rows afterwards you get:

{'Expiration Date     Cost     Resale': '2016-01-01          1.00     2.00'}
{'Expiration Date     Cost     Resale': '1.42     2.42'}
{'Expiration Date     Cost     Resale': '2016-05-02          1.45     9.00'}

So every dict contains only one key. A problem with your file is, that you don't have a valid delimiter between lines. It looks like you mean to use a whitespace, but you have a whitespace in Expiration Date, as well. You will have to get rid of that. If you do that, then you can use the DictReader like this:

import csv,time,string,os,requests, datetime

test = "test.csv"

with open(test) as infile:
    r = csv.DictReader(infile, delimiter=" ", skipinitialspace=True)
for row in r:
    print(row)

will now give you:

{'Resale': '2.00', 'Cost': '1.00', 'ExpirationDate': '2016-01-01'}
{'Resale': None, 'Cost': '2.42', 'ExpirationDate': '1.42'}
{'Resale': '9.00', 'Cost': '1.45', 'ExpirationDate': '2016-05-02'}

which is a proper dict (Notice that the reader has no way of telling, that the first element is the one missing, though). Now you only have to exclude lines that are not complete from writing. A nice way to do that is described here:

import csv,time,string,os,requests, datetime

test = "test.csv"

with open(test) as infile:
    r = csv.DictReader(infile, delimiter=" ", skipinitialspace=True)

    for row in r:
        if not any(val in (None, "") for val in row.itervalues()):
            print(row)

Finally, this will give you all valid lines as dicts:

{'Resale': '2.00', 'Cost': '1.00', 'ExpirationDate': '2016-01-01'}
{'Resale': '9.00', 'Cost': '1.45', 'ExpirationDate': '2016-05-02'}
Community
  • 1
  • 1
jotasi
  • 5,077
  • 2
  • 29
  • 51
  • I didn't really understand the `r = (dict((k, v.strip()) for k, v in row.items() if v) for row in r)` line - the code itself came from an SO post I made about trimming white space on each value before it was written, but you're right, when I remove that line, it solves the problem of this particular thing happening - but the code WAS stripping white space, so now I need to find a new way to trim white space from my output file... – Brian Powell Aug 02 '16 at 18:12
  • @BrianPowell The `skipinitialspace=True`parameter that I used in my sample strips all leading whitespaces. Isn't that what you need? – jotasi Aug 02 '16 at 18:15
  • I have several values in the sheet (which is like 50 columns, not the 3 I showed here) which have white space at the beginning, or at the end: ` hello ` - with the code from my comments above, it did trim that white space out, but I guess it also screwed up my dictionary. My understanding is that `skipinitialspace` only trims white space from the beginning of a value. – Brian Powell Aug 02 '16 at 18:17
  • The problem is not that line but that you don't have commas in you comma-separated-values file and you didn't specify any other delimiter. If you specify `delimiter=" "` then he is able to correctly produce your dictionary (as long as none of the keys/values conatain whitespaces). And then `skipinitialspace=True` skips all whitespaces, since there will be none after an entry (as the first whitespace after an entry is treated as `delimiter`). – jotasi Aug 02 '16 at 19:14
  • the actual file is comma delimited - sorry to make you think otherwise. – Brian Powell Aug 02 '16 at 20:01
  • 1
    Ok then you can actually use something like the line you had. First you will have to change `delimiter=","` and afterwards you can insert `r = [{k.strip(): v.strip() for k, v in row.items()} for row in r]`. That will work for python 2.7.2 and newer. Then you can use the `if` clause later to excempt all `row`s with a missing entry. – jotasi Aug 03 '16 at 04:10
  • Beautiful - I knew the logic of that statement would work in the end, I just wasn't sure how to convert it from trimming everything including the `dict` to just trimming the rows. Thank you SO much for your help on this! – Brian Powell Aug 03 '16 at 15:19
0

You got a KeyError accessing something not in the dict at x['Expiration Date'] so you could say x.get('Expiration Date') or possibly 'Expiration Date' in x instead to discover if it exists and conditionally discard that row.

maxpolk
  • 2,167
  • 1
  • 18
  • 24
  • Actually, his read in is flawed and he never has anything else but `"Expiration Date Cost Resale"` as a `key` in his `dict`. – jotasi Aug 02 '16 at 15:56