0

I have a csv file that contains a large number of values in 4 different columns. Using python, is there a way to add up the values in one particular column (say the values in column 1). I want to find the average of all the values in the one column

values_list = []

for row in b:
   values_list.append(row[1])

I can isolate a particular column using this method but is there a way to modify that in order to be able to add the values and find the average in the particular column

Thanks in advance

NuNu
  • 667
  • 2
  • 12
  • 21

1 Answers1

2

Without a example csv file, I used the following:

1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
0,1,2,3,4
2,3,4,5,6

This python script loads the csv into memory, parses it, collects the values of the n-th column, and computes the sum and mean.

#!/bin/env python

col = 2

values = []
with open('csv.csv', 'r') as csv:
    for line in csv.readlines():
        elements = line.strip().split(',')
        values.append(int(elements[col]))

csum = sum(values)
cavg = sum(values)/len(values)
print("Sum of column %d: %f" % (col, csum))
print("Avg of column %d: %f" % (col, cavg))

For example

$ python parsecsv.py
Sum of column 0: 6.000000
Avg of column 0: 1.000000

$ python parsecsv.py
Sum of column 2: 18.000000
Avg of column 2: 3.000000

If the file is too big to load into memory all at once, you can switch out the readlines() function for a loop using csv.readline()

jedwards
  • 29,432
  • 3
  • 65
  • 92