0

I have written a python program to loop through a list of X files, open each one, read line by line, and write (append) to an output file. Being that these files are several GB each, it is taking very long..

I am looking for suggestions to improve the performance of this program.. I have no formal CS training so it's likely I am missing the "obvious solution" to this problem; I have done some research but again, my limited knowledge (and other higher priority tasks) limits my ability to implement such... This is also my first post on stack overflow .. Thank you in advance.

for name in PR_files:
    with open(PR_path + name, 'r') as f:
        line = f.readline()
        while line:
            with open(PR_out_path, 'a') as g:
                g.write(line + '\n')
                line = f.readline()
    f.close()

The above program will work but will have a blank line between each line in the output text file; this is because the first line of the next file began on the last line of the previous file (my solution to this problem was to add '\n' to each line being written to the output file .. For that reason I wrote another block to remove all blank lines in the output file (yes, very inefficient, probably a much better way to do this)

# this removes all blank lines from out put file
with open(PR_out_path) as this, open(PR_out_path_fix, 'w') as that:
    for line in this:
        if not line.strip():
            continue
        that.write(line)
  • Maybe consider asking this question on the [code review SE](https://codereview.stackexchange.com/). It might be more appropriate for your question. – Leon Aug 24 '18 at 13:36
  • Thanks for suggesting this, Leon. I will do just that. – Danny Schult Aug 24 '18 at 14:01

1 Answers1

0

Why do you want to append it line by line? What about appending the whole file instead, like this?

with open(PR_out_path, 'a') as g:
    for name in PR_files:
        with open(PR_path + name, 'r') as f:
            g.write(f.read())
Sergio
  • 640
  • 6
  • 23