There are two lists to be matched, li_a
is the given list consists of sequence of characters of a sentence, whereas li_b
is the collection of words.
li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
li_b = ['T','Th','The','Thes','These','a','ar','are','c','ca','car','cars']
The process is to match items of li_a
iteratively, with li_b
items. If the first character of li_a
is similar to li_b
items, the first character of li_a
joins with next character, and redo the process until it reaches to its longest match. Then, the longest term should be split, and the process will continue till the end. As the unknown characters and words of li_a
which do not appear in li_b
will be appended as they were before.
The final work should be like this:
new_li = ['These','45','are','cars']
The attempt getting so far, but this works for two strings not for Lists, and it doesn't retrieve unidentified words.
def longest_matched_substring(s1, s2):
m = [[0] * (1 + len(s2)) for i in xrange(1 + len(s1))]
longest, x_longest = 0, 0
for x in xrange(1, 1 + len(s1)):
for y in xrange(1, 1 + len(s2)):
if s1[x - 1] == s2[y - 1]:
m[x][y] = m[x - 1][y - 1] + 1
if m[x][y] > longest:
longest = m[x][y]
x_longest = x
else:
m[x][y] = 0
return s1[x_longest - longest: x_longest]