I'm working in a project with OCR. After some operations I have two strings like that:
s1 = "This text is a test of"
s2 = "a test of the reading device"
I would like to know how can I remove the repetead words of the second string. My idea is to find the position of the word that is repeated in each list. I tried this:
e1 = [x for x in s1.split()]
e2 = [y for y in s2.split()]
for i, item2 in enumerate(e2):
if item2 in e1:
print i, item2 #repeated word and index in the first string
print e1.index(item2) #index in the second string
Now I have the repeated words and their position in the first and second list. I need it to compare word to word if these are in the same order. This because may happen that the same word appear two or more times in the string (future validation).
At the end I would like to have a final string like that:
ns2 = "the reading device"
sf= "This text is a test of the reading device"
I'm using python 2.7 on Windows 7.