2

I have a CSV file with A, B, C, D columns and N rows. The problems is that the data in these columns is not of the same length i.e some are 4.5 and some are 4.52.

My question is in two parts :

How do i access these columns from the csv files. I've used this code to print the contents of the csv file and to read them into an array

    import csv
    with open('file.csv','rb') as f:
        reader = csv.reader(f)
        for row in reader:
            print row

to print the rows in the CSV file and i replaced

    print row 

with

    z = row
    z.append(z)

to save the data into an array.

But z is a 1-D array. And the data is of type string. When i try performing operations of the type np.median(z), it gives me an error. Also, i cannot do

    z.append(float(z))

This is giving me a TypeError.

And, is there anyway to truncate the values and set them to a certain precision while we are importing them from the csv file?! Like, if the file has values like 4.3, 4.56, 4.299, ..., i want to constrain what i finally import to just one decimal point.

This SE question is the closest to answering my 2nd question - Python - CSV: Large file with rows of different lengths - but i do not understand it. If any of you can help me regarding this, I'd be thankful.

EDIT 1 : @ Richie : here's a sample data set - http://goo.gl/io8Az. It links to a google doc. And regd your comment, this was the outcome with i ran your code on my csv file -

     ValueError: could not convert string to float: plate

@ Pieters : z = row, z.append(z) created this - ['3836', '55302', '402', '22.945717', '22.771544', '23.081865', '22.428421', '21.78294', '164.40663689', '-1.25641627', '1.780485', '1237674648848106129', [...]].

I should've mentioned that i've just started using python and i'm learning things on a need-to-know basis! I'm improvising with bits and pieces of code i'm finding on the web.

EDIT 2: I've heard about pandas. I guess i should start using it.

@ Khalid - i've run your code and i'm able to retrieve the column i want. Instead of printing the whole row out, can i access it instead?! as a static array?!

EDIT 3: @ richie : the first time i ran your code, this showed up -

Traceback (most recent call last): File "", line 4, in ValueError: could not convert string to float: plate

well, i realized that the first row containing the column names is the cause, so i removed the first row, saved this as a new file and ran the code on that file and it worked perfectly fine.

But, if i do remove the first line, which contains the column identifiers, i cannot use the method mentioned by khalid below. I am looking at pandas in the meanwhile.

Thanks for everything guys :)

EDIT 4 : Lesson Learnt. Pandas is Awesome. Job Done :)...

Community
  • 1
  • 1
Poruri Sai Rahul
  • 281
  • 1
  • 4
  • 8

2 Answers2

4

A few things, depending on what you want to do. Here is the simple approach to get them referenced by columns:

import csv

with open('file.csv','r') as f:
    reader = csv.DictReader(f, delimiter=',')
    rows = list(reader)

for row in rows:
   print row['plate']

If you want to convert them to floats or ints, you can use map. However, I suspect you want to do some calculations in the end, and for that its better to use pandas.

As an added bonus, pandas will give you a 2D grid respresentation called a DataFrame of your file.

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
2

Try this;

import csv
import numpy as np
class onefloat(float):
   def __repr__(self):
       return "%0.1f" % self
with open('file.csv','rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print map(onefloat,row) # your issue of 1 decimal point is taken care of here
        print '{:.1f}'.format(np.median(map(float,row))) # in case you want this too to be of 1 decimal point

And this is how it is done using Pandas

import pandas as pd
data = pd.read_csv('richards_quasar_outliers.csv')
print data['plate'].median()
richie
  • 17,568
  • 19
  • 51
  • 70
  • you ran into this error `ValueError: could not convert string to float: plate` because `'plate'` is a header in your csv file. For more on how to ignore headers while reading a csv file see http://stackoverflow.com/a/11350095/1948860 . For now remove the headers and test the code. – richie Jun 19 '13 at 09:55