How to remove repeated words between two strings in python?

Question

I'm working in a project with OCR. After some operations I have two strings like that:

s1 = "This text is a test of"
s2 = "a test of the reading device"

I would like to know how can I remove the repetead words of the second string. My idea is to find the position of the word that is repeated in each list. I tried this:

e1 = [x for x in s1.split()]
e2 = [y for y in s2.split()]

for i, item2 in enumerate(e2):
    if item2 in e1:
        print i, item2 #repeated word and index in the first string
        print e1.index(item2) #index in the second string

Now I have the repeated words and their position in the first and second list. I need it to compare word to word if these are in the same order. This because may happen that the same word appear two or more times in the string (future validation).

At the end I would like to have a final string like that:

ns2 = "the reading device"    
sf= "This text is a test of the reading device"

I'm using python 2.7 on Windows 7.

Documentation exists. Please use it. https://docs.python.org/3.6/tutorial/datastructures.html#more-on-lists — , Jan 11 '17 at 06:29

score 2 · Accepted Answer · answered Jan 11 '17 at 06:38

2

Here is an another attempt,

from difflib import SequenceMatcher as sq
match = sq(None, s1, s2).find_longest_match(0, len(s1), 0, len(s2))

Result

print s1 + s2[match.b+match.size:]

This text is a test of the reading device

answered Jan 11 '17 at 06:38

Rahul K P

15,740
4
35
52

It works fine but what happens if I have something like [that](http://stackoverflow.com/questions/41624787/how-to-delete-invalid-characters-between-multiple-strings-in-python/41624839#41624839). I hope that you can help me! – Alex Ortega Jan 12 '17 at 23:49

score 0 · Answer 2 · answered Jan 11 '17 at 07:14

0

Maybe this?
' '.join([x for x in s1.split(' ')] + [y for y in s2.split(' ') if y not in s1.split(' ')]) I haven't test it carefully but this may be a good idea for dealing with such kind of demands.

answered Jan 11 '17 at 07:14

Hou Lu

3,012
2
16
23

It will remove the all occurring word in second string. – Rahul K P Jan 11 '17 at 07:45
Won't it only remove those that already exist in the first one? – Hou Lu Jan 11 '17 at 07:55
`s1 = 'hi are you there'` and `s2 = 'you there hi for'` try this inputs, It will make sense for you. – Rahul K P Jan 11 '17 at 08:14
As @RahulKP said. When the same word appear two or more times in the string it doesn't work. – Alex Ortega Jan 11 '17 at 16:24

How to remove repeated words between two strings in python?

2 Answers2