I am trying to read some specific rows of a large csv file, and I don't want to load the whole file into memory. The index of the specific rows are given in a list L = [2, 5, 15, 98, ...]
and my csv file looks like this:
Col 1, Col 2, Col3
row11, row12, row13
row21, row22, row23
row31, row32, row33
...
Using the ideas mentioned here I use the following command to read the rows
with open('~/file.csv') as f:
r = csv.DictReader(f) # I need to read it as a dictionary for my purpose
for i in L:
for row in enumerate(r):
print row[i]
I immediately get the following error:
IndexError Traceback (most recent call last)
<ipython-input-25-78951a0d4937> in <module>()
6 for i in L:
7 for row in enumerate(r):
----> 8 print row[i]
IndexError: tuple index out of range
Question 1. It seems like my use of the for
loops here is obviously wrong. Any ideas on how to fix this?
On the other hand, the following gets the job done, but it's too slow:
def read_csv_line(line_number):
with open("~/file.csv") as f:
r = csv.DictReader(f)
for i, line in enumerate(r):
if i == (line_number - 2):
return line
return None
for i in L:
print read_csv_line(i)
Question 2. Any idea on how to improve this basic method of going through the whole file until I reach row i then print it?