-1

I have two files, file 1 contains 2 columns, file 2 contains 5 columns. I want to remove the lines from file 2 that dont contain common strings with file 1:

-file 1, if this is a list, each line contains [0] and [1]

gene-3  +
gene-2  -
gene-1  -

-file 2, compare [0] and [1] from file 1, against [0] and [4] of this file. If noline from file1 matches in any line of file2, must be removed.

gene-1  mga CDF 1   +  # this line contains + instead - although gane-1 is the same. rm
gene-2  mga CDS 1   -  # [0][1] from file 1 = [0][4] from file 2: (gene-2, - ) keep it!
gene-3  mga CDH 1   +  # ""                 ""              ""
gene-4  mga CDS 1   +  # no gene-4 in file 1, remove.

-Desired output:

gene-3  mga CDH 1   +
gene-2  mga CDS 1   -

any ideas?

  • make a set of the different strings in files one, only keep lines that intersect it in file 2 – Padraic Cunningham Nov 11 '14 at 14:48
  • This is an interesting problem, but it's unclear which part you're having trouble with. Do you know how to read/write files? How to split lines on whitespace? – Tim Pietzcker Nov 11 '14 at 14:49
  • actually you example makes no sense, what are you matching line four in file2 against? There is a `+` and a `-` in file 1 – Padraic Cunningham Nov 11 '14 at 14:59
  • @PadraicCunningham I think OP simply checks if `gene-4` is not in the `file1` so remove it from `file2` too. another case the OP is just want to keeps those entries which have same sign `+` or `-` at the end in the `file2` and `file1`. – ρss Nov 11 '14 at 15:01
  • I need to compare [0] and [1] from file 1, against [0] and [4] from file 2. If they dont match any list, remove. – Peaceandlove Nov 11 '14 at 15:02
  • Looking at your question history, it seems that SO is doing your homework. – xbello Nov 11 '14 at 16:07
  • Hello xbello, I'm learning tons of things thanks to SO, so in fact Im going to make questions again, and again, and again... :) – Peaceandlove Nov 11 '14 at 23:34

2 Answers2

1
with open("file1.txt") as f, open("file2.txt") as f1:
    items  = set(line.rstrip() for line in f)
    filtered = [line for line in f1 if "  ".join(line.split()[::4]) in items]
    with open("file2.txt","w") as f3:
        f3.writelines(filtered)
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
0
with open('file1', 'r') as f:
    keepers = set(tuple(line.split()) for line in f)
with open('file2', 'r') as f_in, open('file3', 'w') as f_out:
    for line in f_in:
        parts = line.split()
        if (parts[0], parts[-1]) in keepers:
            f_out.write(line)
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119