append contents from one file to another with newline separation

Question

I'm trying to, I think, replicate the cat functionality of the Linux shell in a platform-agnostic way such that I can take two text files and merge their contents in the following manner:

file_1 contains:

42 bottles of beer on the wall

file_2 contains:

Beer is clearly the answer

Merged file should contain:

42 bottles of beer on the wall  
Beer is clearly the answer

Most of the techniques I've read about, however, end up producing:

42 bottles of beer on the wallBeer is clearly the answer

Another issue is that the actual files with which I'd like to work are incredibly large text files (FASTA formatted protein sequence files) such that I think most methods reading line-by-line are inefficient. Hence, I have been trying to figure out a solution using shutil, as below:

def concatenate_fasta(file1, file2, newfile):
    destination = open(newfile,'wb')
    shutil.copyfileobj(open(file1,'rb'), destination)
    destination.write('\n...\n')
    shutil.copyfileobj(open(file2,'rb'), destination)
    destination.close()

However, this produces the same problem as earlier except with "..." in between. Clearly, the newlines are being ignored but I'm at a loss with how to properly manage it.

Any help would be most appreciated.

EDIT:

I tried Martijn's suggestion, but the line_sep value returned is None, which throws an error when the function attempts to write that to the output file. I have gotten this working now via the os.linesep method mentioned as less-optimal as follows:

with open(newfile,'wb') as destination:
    with open(file_1,'rb') as source:
        shutil.copyfileobj(source, destination)
    destination.write(os.linesep*2)
    with open(file_2,'rb') as source:
        shutil.copyfileobj(source, destination)
    destination.close()

This gives me the functionality I need, but I'm still at a bit of a loss as to why the (seemingly more elegant) solution is failing.

This is not an answer, but `file1`, `file2` parameters does not match `file_1`, `file_2` in the function body. — falsetru, Dec 16 '13 at 09:50
@falsetru Whoops, yes, that was my bad. Thanks for catching it. Corrected. — glarue, Dec 16 '13 at 17:34

score 4 · Accepted Answer · answered Dec 16 '13 at 10:02

You have opened the files in binary mode, so no newline translation will take place. Different platforms use different line endings, and if you are on Windows \n is not enough.

The simplest method would be to write os.linesep here:

destination.write(os.linesep + '...' + os.linesep)

but this could violate the actual newline convention used in the files.

The better approach would be to open the text files in text mode, read a line or two, then inspect the file.newlines attribute to see what the convention is for that file:

def concatenate_fasta(file_1, file_2, newfile):
    with open(file_1, 'r') as source:
        next(source, None)  # try and read a line
        line_sep = source.newlines
        if isinstance(line_sep, tuple):
            # mixed newlines, lets just pick the first one
            line_sep = line_sep[0]

    with open(newfile,'wb') as destination
        with open(file_1,'rb') as source:
            shutil.copyfileobj(source, destination)
        destination.write(line_sep + '...' + line_sep)

        with open(file_2,'rb') as source:
            shutil.copyfileobj(source, destination)

You may want to test file_2 as well, perhaps raising an exception if the newline convention used doesn't match the first file.

Thanks very much for the thorough answer; I will give this a shot today. I saved this part of my script for last, thinking "it must be trivial to combine the two files together". Live and learn. — glarue, Dec 16 '13 at 17:36
Didn't quite work (see updated OP); your other suggestion seems to have done the trick, although I wonder if it might break something down the line as you caution. — glarue, Dec 16 '13 at 18:39
@glarue: It could be that you need to read more than one line; perhaps make it a loop reading a few lines until `source.newlines` is set, with a maximum number of lines before you give up. — Martijn Pieters, Dec 16 '13 at 18:41

score 1 · Answer 2 · answered Dec 16 '13 at 10:10

It seems, that your source files may not be ending with newline. In such scenario, it would be beneficial to read the last character(or more based on your platform) of the file to determine if its a new line character(s) os.linesep, and accordingly add a newline to the output file.

with open("file1.txt",'rb') as fin1, \
     open("file2.txt",'rb') as fin2,  \
     open("file3.txt",'wb') as fout:
    shutil.copyfileobj(fin1, fout)
    fin1.seek(-len(os.linesep), 2)
    if fin1.read() != os.linesep:
            fout.write(os.linesep)
    shutil.copyfileobj(fin2, fout)

score 0 · Answer 3 · answered Oct 27 '17 at 19:11

from sys import argv
from os.path import exists

script, from_file, to_file = argv

print "Copying from %s to %s" % (from_file, to_file)

# we could do these two on one line too, how?
in_file = open(from_file, 'rb')
indata = in_file.read()


print "Ready, hit RETURN/ENTER to continue, CTRL- C to abort."
raw_input()

out_file = open(to_file, 'a')

out_file.write(indata)
print "Alright, all done."

out_file.close()
in_file.close()

append contents from one file to another with newline separation

3 Answers3

Linked