I have a CSV file with A, B, C, D columns and N rows. The problems is that the data in these columns is not of the same length i.e some are 4.5 and some are 4.52.
My question is in two parts :
How do i access these columns from the csv files. I've used this code to print the contents of the csv file and to read them into an array
import csv
with open('file.csv','rb') as f:
reader = csv.reader(f)
for row in reader:
print row
to print the rows in the CSV file and i replaced
print row
with
z = row
z.append(z)
to save the data into an array.
But z is a 1-D array. And the data is of type string. When i try performing operations of the type np.median(z), it gives me an error. Also, i cannot do
z.append(float(z))
This is giving me a TypeError.
And, is there anyway to truncate the values and set them to a certain precision while we are importing them from the csv file?! Like, if the file has values like 4.3, 4.56, 4.299, ..., i want to constrain what i finally import to just one decimal point.
This SE question is the closest to answering my 2nd question - Python - CSV: Large file with rows of different lengths - but i do not understand it. If any of you can help me regarding this, I'd be thankful.
EDIT 1 : @ Richie : here's a sample data set - http://goo.gl/io8Az. It links to a google doc. And regd your comment, this was the outcome with i ran your code on my csv file -
ValueError: could not convert string to float: plate
@ Pieters : z = row, z.append(z) created this - ['3836', '55302', '402', '22.945717', '22.771544', '23.081865', '22.428421', '21.78294', '164.40663689', '-1.25641627', '1.780485', '1237674648848106129', [...]].
I should've mentioned that i've just started using python and i'm learning things on a need-to-know basis! I'm improvising with bits and pieces of code i'm finding on the web.
EDIT 2: I've heard about pandas. I guess i should start using it.
@ Khalid - i've run your code and i'm able to retrieve the column i want. Instead of printing the whole row out, can i access it instead?! as a static array?!
EDIT 3: @ richie : the first time i ran your code, this showed up -
Traceback (most recent call last): File "", line 4, in ValueError: could not convert string to float: plate
well, i realized that the first row containing the column names is the cause, so i removed the first row, saved this as a new file and ran the code on that file and it worked perfectly fine.
But, if i do remove the first line, which contains the column identifiers, i cannot use the method mentioned by khalid below. I am looking at pandas in the meanwhile.
Thanks for everything guys :)
EDIT 4 : Lesson Learnt. Pandas is Awesome. Job Done :)...