I'm trying to iterate over the lines of a csv, for each line, I want to do a bunch of work, save that line in a destination csv and remove it from the original csv, saving both origin and destination csv files at every line (save state in case of a crash). Is there an elegant way of doing this that doesn't involve opening and closing the file at every point?
-
2Maybe using `sqlite` would be more appropriate? – Andrej Kesely Jul 21 '22 at 19:03
-
1Some discussion [here](https://stackoverflow.com/questions/18984092/python-2-7-write-to-file-instantly) about using `flush()` to actually write to the file while in progress, and some discussion [here](https://stackoverflow.com/questions/17577137/do-files-get-closed-during-an-exception-exit) about using exceptions (`try:except:`) and the `with` context manager to account for writing and closing the file in case of an _anticipated_ exception. I think much of this depends on what sort of "crash" you expect – G. Anderson Jul 21 '22 at 22:22
-
yes, sqlite would be awesome, but unfortunately this is about processing csv files, so can't really do that. Hoping to move over to an api-based system in the near future. – Andres Jul 22 '22 at 01:51
-
Ok, it seems like what i'm trying to do is not really standard or kosher... it's a nice to have so I guess I'll just process entire files before moving them over – Andres Jul 22 '22 at 01:52
1 Answers
To write to a file immediately, open it without buffering:
with open("test.csv", "w", buffering=0) as my_file:
...
This makes sense for the output file; repeatedly deleting the first line of the input is another matter. The only way to do that is to write out the entire remainder of the file. Over and over (google "quadratic complexity"). Which will definitely have a performance impact, and rather increases rather than reduces the chance that something will go wrong.
I strongly recommend leaving the input file alone, and finding another way to keep track of how much has been processed. (E.g. write out somewhere else the number of lines that have been processed, and adapt your code to skip this many lines.)
PS. If you wanted to get cute you could process the input file from the end (last row first), and use truncate
to delete each processed line without rewriting what comes before. But that's tricky to get right, and really it's not a good fit for your goal of simply tracking how far you have gotten with processing.

- 48,685
- 16
- 101
- 161