Comparing 2 txt files in Python

Question

I have 2 txt files.

File A:

jack john jim
george colin stan

File B:

hell jack john jim goad tiger
tall jack jim john filer dom
hell george colin jim stab tiger
track jack george colin stan forever

I want that each line of file A is checked with every line of file B word by word and return true if there are consecutive matches. Eg

jack from first file A is taken and checked with first line of file B, if found then john is checked and then jim. then we move second line of file B and so on. After that we move on to second line of file A and repeat the process.It will return true only if the matches are consecutive so first line of file B will return True because jack john and jim are in order but second line of file B will return false because they are not in right order.

I have to take it as word by word and will not treat the whole line as a string, so it has to be broken into words and then compared word by word..

Have you try at your end? – Vivek Sable Apr 08 '15 at 05:55 — Vivek Sable, Apr 08 '15 at 05:55

score 0 · Answer 1 · answered Apr 08 '15 at 05:45

0

If you have the words each on a separate line, you can do

in_file.readlines()

to get a list of the lines. If you have all the words on a single line separated by spaces, then do:

in_file.read().split(' ')

Comparison should be straightforward, something like this:

def compare(a_words, b_words):
    for a_word, b_word in izip(a_words, b_words):
        if a_word != b_word:
            return False
    return True

If you have multiple lines each with multiple words, then you should first read all the lines, then for each line call the compare function passing it words split from each line.

answered Apr 08 '15 at 05:45

tecfreak

105
1
2
7

this will not produce desirable results as it will only match exact same positions of words rather than relative positions. – Droopy4096 Apr 08 '15 at 05:48
This is a simple comparison, what i actually need as explained above that we need to keep a track of positions i.e if there is a first match the next one should be the immediate one in second file and so on – Sb92 Apr 08 '15 at 05:52

Droopy4096 · Accepted Answer · 2015-04-08T14:19:43.417

Here's memory inefficient, brute-forcinsh way of implementing comparison function:

def compare(list_a,list_b):
    for a in list_a:
        try:
            b_index=list_b.index(a)
        except ValueError:
            return False
        return compare(list_a[1:],b[b_index:])

you will need to read in line by line from each file so there's going to be nested loop calling compare() function:

a_lines_raw=a_file.readlines()
b_lines_raw=b_file.readlines()

a_lines=[ line.split() for line in a_lines_raw ]
b_lines=[ line.split() for line in b_lines_raw ]

for a_line in a_lines:
    for b_line in b_lines:
        if compare(a_line,b_line):
            print "Match:", str(a_line), str(b_line)

one can optimize this by passing along indexes for a_list and b_list and making a_list and b_list "global" to the compare() function. Either by really making them "global" or by wrapping compare() into another function that defines a_list and b_list and then only passing indexes into the function... last but not least - would be to implemet it as a class and store a_list and b_list as attributes and having compare() as a method of the class accepting indexes with default 0 index for both.

Comparing 2 txt files in Python

2 Answers2