Parse csv to dict

Question

I am trying to parse csv financial data from the web into a dict that I can navigate through by key.
I am failing using csv.DictReader.
I have:

import csv
import urllib2
req = urllib2.Request('http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:BRCM&region=usa&culture=en-US&cur=USD&reportType=is&period=12&dataType=A&order=desc&columnYear=5&rounding=3&view=raw&r=886300&denominatorView=raw&number=3')
response = urllib2.urlopen(req)
response.next() 
csvio = (csv.DictReader(response))
print csvio.fieldnames 
print len(csvio.fieldnames)

Edited to reflect changes from answer below.

This almost gets me there, but I need to strip the leading 'Fiscal year...share data' before feeding it to DictReader. How best to do this? I have looked at converting to string and stripping lead chars with str.lstrip() as the docs say here with no luck.

Incidentally, if you're going to be doing a significant amount of work with tabular data, I recommend taking a look at [pandas](http://pandas.pydata.org). You're likely to spend a fair bit of time reinventing its functionality otherwise. — DSM, Sep 19 '13 at 21:58

nakedfanatic · Accepted Answer · 2013-09-19T21:43:19.423

1

To use a DictReader you need to either specify the field names, or the field names need to be the first row of the csv data (ie. a header row).

In the csv file that your code retrieves, the field names are in the second row of data, not the first. What I've done is to throw out the first line of data before passing the csv file to the DictReader constructor.

In response to your updated question: Stripping the leading text from the header row probably isn't desirable as this is acting as the field name for the first column of data. Probably better to discard the first 2 rows of data and then supply the desired field names directly to the DictReader. I have updated the example below to reflect this.

import csv
import urllib2

req = urllib2.Request('http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:BRCM&region=usa&culture=en-US&cur=USD&reportType=is&period=12&dataType=A&order=desc&columnYear=5&rounding=3&view=raw&r=886300&denominatorView=raw&number=3')
response = urllib2.urlopen(req)

response.readline() # This reads (and discards) the first row of data which is not needed.
response.readline() # skip the 

myFieldnames = ["firstColName", "TTM", "2012", "2011", "2010", "2009", "2008"]
csvio = csv.DictReader(response, fieldnames=myFieldnames)

print csvio.fieldnames 
for row in csvio:
    print row

edited Sep 19 '13 at 21:43

answered Sep 19 '13 at 03:36

nakedfanatic

3,108
2
28
33

Your response led me to the .next method for iterables. I now have: 'req = urllib2.Request('http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:BRCM&region=usa&culture=en-US&cur=USD&reportType=is&period=12&dataType=A&order=desc&columnYear=5&rounding=3&view=raw&r=886300&denominatorView=raw&number=3') response = urllib2.urlopen(req) response.next() csvio = (csv.DictReader(response)) print csvio.fieldnames print len(csvio.fieldnames)'. This almost gets me there, but I need to strip the leading 'Fiscal year...share data.' before feeding it to DictReader. How best to do this? – dman Sep 19 '13 at 20:22
I don't think stripping that leading text would be a good idea as it is acting as the field name for the first column of data. One solution would be to skip the first 2 rows of data, and then supply the desired field names directly to the DictReader. eg: `csvio = csv.DictReader(response, fieldnames=["firstColName", "2012", "2011", "2010", "2009", "2008"])` – nakedfanatic Sep 19 '13 at 21:36
Sorry, missed a column name. Should be `csvio = csv.DictReader(response, fieldnames=["firstColName", "TTM", "2012", "2011", "2010", "2009", "2008"])`. I've updated my answer to reflect this anyway. – nakedfanatic Sep 19 '13 at 21:44
This got me over the hump. How do you know that the leading text is the field name for the first column (or the structure at all when it is a urllib2.urlopen() object)? Trying to understand so I can ask more intelligent questions in the future (or solve before having to ask). Thanks. – dman Sep 19 '13 at 22:21
@dman I downloaded the csv file and had a look at it in a text editor. I saw that the first row didn't contain field names at all, and the second row contained field names, but the first one didn't really make sense. The header row should contain one descriptive field name for each field in the rows of data that follow. In the real world we can't always rely on the source data being properly formatted like this though, which is why it's good to inspect the data and identify any workarounds you might have to perform. – nakedfanatic Sep 19 '13 at 22:28

Parse csv to dict

1 Answers1