5

I'm trying to open large .csv files (16k lines+, ~15 columns) in a python script, and am having some issues.

I use the built in open() function to open the file, then declare a csv.DictReader using the input file. The loop is structured like this:

for (i, row) in enumerate(reader):
     # do stuff (send serial packet, read response)

However, if I use a file longer than about 20 lines, the file will open, but within a few iterations I get a ValueError: I/O operation on a closed file.

My thought is that I might be running out of memory (though the 16k line file is only 8MB, and I have 3GB of ram), in which case I expect I'll need to use some sort of buffer to load only sections of the file into memory at a time.

Am I on the right track? Or could there be other causes for the file closing unexpectedly?

edit: for about half the times I run this with a csv of 11 lines, it gives me the ValueError. The error does not always happen at the same line

Trey
  • 348
  • 5
  • 16
  • do you get the same problem using a csv.Reader and just iterating with for row in reader: do_stuff()? that's a relatively small file to be experiencing that type of problem. – jcomeau_ictx Jun 15 '11 at 22:53
  • yes, i do have the same error when i just use `for row in reader` – Trey Jun 15 '11 at 23:01
  • 2
    It's very unlikely that you're running out of memory. Are other processes acting on the file? Are you opening the file in the correct mode? If you use a 20-line file, do you get the expected results? What does "send serial packet" mean in your comment above -- is it possible that the I/O error is coming from that step rather than from the CSV reader itself? Providing a complete traceback is always good. – Russell Borogove Jun 15 '11 at 23:11
  • Your CSV file is tiny. The error is nothing to do with size. It's likely to be nothing to do with CSV files at all. Show ALL of your code. Show the full traceback. – John Machin Jun 16 '11 at 01:26
  • @Russell - There are no other processes running on the file. Using the 20 line file I don't get expected (working) results. My script also used the pySerial module to send serial packets to an embedded processor, which is what I meant in the comment. Will post traceback asap... – Trey Jun 16 '11 at 21:28
  • are you absolutely certain the CSV file is correctly formatted? An unmatched quote part way through could cause the error you describe. – user340140 Oct 14 '13 at 00:35

2 Answers2

4

16k lines is nothing for 3GB Ram, most probably your problem is something else e.g. you are taking too much time in some other process which interferes with opened file. Just to be sure and anyway for speed when you have 3GB ram , load whole file in memory and then parse e.g.

import csv
import cStringIO
data = open("/tmp/1.csv").read()
reader = csv.DictReader(cStringIO.StringIO(data))
for row in reader:
    print row

In this at-least you shouldn't get file open error.

Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219
-2

csv_reader is faster. Read the whole file as blocks. To avoid the memory leak better to use sub process. from multiprocessing import Process

def child_process(name):
     # Do the Read and Process stuff here.if __name__ == '__main__':
     # Get file object resource.
      .....
     p = Process(target=child_process, args=(resource,))
     p.start()
     p.join()

For more information please go through this link. http://articlesdictionary.wordpress.com/2013/09/29/read-csv-file-in-python/