3

I am trying to read the rows of a csv file. My file looks like this

Col 1, Col 2, Col3
row11, row12, row13
row21, row22, row23
row31, row32, row33
...

I use the following command to read the rows

with open('~/data.csv') as f:
    r = csv.DictReader(f)
    for i in range(5):
        print(list(r)[i])

The output prints the first row, but then it give the out of index error right after.

IndexError                                Traceback (most recent call last)
<ipython-input-15-efcc4f8c760d> in <module>()
      2     r = csv.DictReader(f)
      3     for i in range(5):
----> 4         print(list(r)[i])

IndexError: list index out of range

I'm guessing I'm making a silly mistake somewhere, but can't spot it. Any ideas on what I am doing wrong and how to fix it?

EDIT: This is the output of print(list(r)):

[{'Col 1': 'row11', ' Col3': ' row13', ' Col 2': ' row12'}, {'Col 1': 'row21', ' Col3': ' row23', ' Col 2': ' row22'}, {'Col 1': 'row31', ' Col3': ' row33', ' Col 2': ' row32'}, {'Col 1': 'row41', ' Col3': ' row43', ' Col 2': ' row42'}, {'Col 1': 'row51', ' Col3': ' row53', ' Col 2': ' row52'}, {'Col 1': 'row61', ' Col3': ' row63', ' Col 2': ' row62'}, {'Col 1': 'row71', ' Col3': ' row73', ' Col 2': ' row72'}, {'Col 1': 'row81', ' Col3': ' row83', ' Col 2': ' row82'}, {'Col 1': 'row91', ' Col3': ' row93', ' Col 2': ' row92'}, {'Col 1': 'row101', ' Col3': ' row103', ' Col 2': ' row102'}]
K1.
  • 275
  • 1
  • 2
  • 10

1 Answers1

5

DictReader(f) just gives you a one time look at your file -- you can only call list on it once, but you call it multiple times because it's within the loop. Later calls return an empty list. Just call list on it outside of the loop and save it in a variable, and you'll be golden.

That is:

r = csv.DictReader(f)
rows = list(r)
for i in range(5):
    print(rows[i])

Or, don't pull the whole thing into memory at any point:

for row in csv.DictReader(f):
    print row

If you'd like to keep the index around for other purposes:

 for i, row in enumerate(csv.DictReader(f)):
     print i, row

If you want to get specific rows from an iterator (which csv.DictReader is a special case of) without pulling the whole thing into memory, check out itertools.islice at https://docs.python.org/3/library/itertools.html. It basically allows list-style slicing on an iterator.

  # prints first five rows
  for row in itertools.islice(csv.DictReader(f), 5):
       print row

For more sporadic rows:

  needed_row_indices = {2, 5, 20}
  for i, row in enumerate(csv.DictReader(f)):
      if i in needed_row_indices:
          print row
jwilner
  • 6,348
  • 6
  • 35
  • 47
  • Thank you very much, while that works for a small example, my problem is that I am working with a big data set, about 10Gb. Any idea on how to avoid loading the whole thing? – K1. Apr 10 '15 at 05:34
  • 1
    Updated to show how you can avoid pulling the whole thing into memory. @Keivan – jwilner Apr 10 '15 at 05:35
  • Thank you again. Would you happen to have some idea on how to read several specific rows of the file this way? That was originally my question and I was trying to simplify it, so I got to this point. This solve my question asked above though. – K1. Apr 10 '15 at 05:41
  • 1
    @Keivan check out `itertools`, which I explain a little in the post. – jwilner Apr 10 '15 at 05:44
  • Thanks, that helps a lot. It still have some problems though, it seems like it reads the rows and adds up the index. Basically I need to read for example line 2, then line 5, then line 20, and so on in the file, and I am finding these 2,5,20 etc by some calculations, can I call them directly somehow? – K1. Apr 10 '15 at 06:00
  • 1
    @Keivan, if the rows you need are more sporadically distributed like that, I would suggest you a) build a set of the rows you need -- for example, `needed_rows = {2, 5, 20}` -- and then b) use the above example with `enumerate` but only act on the row when `i in needed_rows` – jwilner Apr 10 '15 at 06:11
  • Yeah, that makes sense. For some reason I was trying to avoid making a new list, probably because I was afraid of it getting too big, but what you say makes sense. I'll do that. Thanks again. – K1. Apr 10 '15 at 06:13
  • I mean, if there's a formula or other function that can serve as the test, you can of course use that instead of the set membership -- but if it's basically a hardcoded set of randomly distributed rows, not sure how much better you can do. – jwilner Apr 10 '15 at 06:14
  • Yes, it's basically random, I find these row numbers in a data set, and then have to find corresponding data to those rows in the second data set. – K1. Apr 10 '15 at 06:16
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/74897/discussion-between-keivan-and-jwilner). – K1. Apr 10 '15 at 06:22