2

I have 3 files-
file1:

A
B
C

file2:

10,20,30,40
40,20,50,60
60,20,80,90

file3:

20
30
40

How do I merge the files to generate a single file in the form:

A  10,20,30,40  20
B  40,20,50,60  30
C  60,20,80,90  40

Where every column is separated by a '\t',that is,a tab and Not a space.
I'm really new to Python and I'm not sure how to implement this. I have seen various examples on the net where they simply concatenate the files without preserving the pattern(column).
How do I preserve the pattern by separating the columns with a single tab character? Any relevant code would be really helpful. Thanks.

HackCode
  • 1,837
  • 6
  • 35
  • 66
  • 2
    This question appears to be off-topic because there is no indication of an attempt to solve it yourself. – Veedrac May 26 '14 at 18:43

2 Answers2

5

That's a good job for .join() and zip():

Assuming f1, f2 and f3 are handles to your input files, and output is a handle to your output, you can do

for items in zip(*(f1, f2, f3)):
    output.write("\t".join(item.strip() for item in items) + "\n")

zip() zips all corresponding items (all the first lines, second lines, etc.) together, presenting them as a tuple.

join() joins them into a string, using \t as a separator.

Since we've read the lines from a file, they mostly end in \n, so we need to strip those before joining the strings.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Nice to see Python's philosophy of providing a fairly obvious way to do things working out. I see significant similarities in our approaches. :) – Yann Vernier May 26 '14 at 16:34
2

While it's easy enough to do this in Python, there's a standard unix tool to do it as well. Just do paste file1 file2 file3 > singlefile.

The same job within Python could be something like:

import itertools
def paste(outfile, separator="\t", *infiles):
    for line in itertools.izip_longest(*infiles, fillvalue=""):
        outfile.write(separator.join(column.rstrip("\n") for column in line)+"\n")
if __name__=="__main__":
    import sys
    paste(sys.stdout, "\t", *map(open,sys.argv[1:]))
Yann Vernier
  • 15,414
  • 2
  • 28
  • 26