0

I have an origin file (fo.log) like below:

title1  title2  title3
o11     o12     o13
o21     o22     o23
o31     o32     o33

And a destination file (fd.log) like below:

d11   d12
d21   d22
d31   d32

Both files have the same number of lines (that could be millions of lines), except for the title line of the origin file. Thinking about memory usage, I don't want to read all lines to memory.

After processing my script, I would like to have the destination file (fd.log) like below:

d11   d12   o13
d21   d22   o23
d31   d32   o33

which means that I took the last information of each origin file line and appended it to the corresponding destination line.

Correspondence between lines from one file to another is just the line position and there is nothing to do with the information on it.

The closest script I could do is written bellow and it correctly prints the information I want.

from pathlib import Path

file_from = Path("path-to-origin-file/fo.log").open()
file_to = Path("path-to-destination-file/fd.log").open()

# create an enumerator to iterate over origin file lines
eft = enumerate(file_to)

# skip the first line with titles
next(eft)

for line_counter,line_to in eft:
    print(' '.join([
                line_to.rstrip('\n'), 
                file_from.readline().split()[2]]))

file_from.close()
file_to.close()
Carlos Ost
  • 492
  • 7
  • 22

3 Answers3

1

For small enough files you can prepare the file contents as a list or a string and then write it to the file, e.g.:

from pathlib import Path

with Path('in-file').open() as in_file, Path('out-file').open('r+') as out_file:
    lines = []

    for line1, line2 in zip(in_file.readlines()[1:], out_file.readlines()):
        line = '{}   {}\n'.format(line2.rstrip(), line1.split()[-1])
        lines.append(line)

    out_file.seek(0)  # rewind the output file to the beginning
    out_file.writelines(lines)

For bigger files consider using a backup file as stdlib's fileinput module does.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
0
with open('newfile.csv' 'r+b') as f:
    for line_counter,line_to in eft:
        print(' '.join(
            [line_to.rstrip('\n'), 
            file_from.readline().split()[2]])
        )
amitchone
  • 1,630
  • 3
  • 21
  • 45
  • I've made an update to the question with the files names. I don't want to create a new file, but append to the existing one. – Carlos Ost Jul 20 '18 at 14:20
0
with open('text1.txt', 'r') as istr:
    with open('text2.txt', 'r+') as ostr:
        iistr = istr.readlines()
        oostr = ostr.readlines()
        fstr = zip(iistr[1:], oostr)
        output_lines = []
        for iline, oline in fstr:
            # Get rid of the trailing newline (if any).
            output_lines.append(oline.rstrip('\n') + '  ' + iline.split()[2] + '\n')

        ostr.seek(0)
        ostr.writelines(output_lines)
Jay
  • 24,173
  • 25
  • 93
  • 141
  • Thank you, but thinking about memory usage, I don't want to read all lines to the memory (I willl update my question). What if I have 1 million lines to analyze? The logic I need is so simple, but maybe I will need an auxiliar file to do that :-( – Carlos Ost Jul 20 '18 at 14:41