1

There are two lists to be matched, li_a is the given list consists of sequence of characters of a sentence, whereas li_b is the collection of words.

li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']


li_b = ['T','Th','The','Thes','These','a','ar','are','c','ca','car','cars']

The process is to match items of li_a iteratively, with li_b items. If the first character of li_a is similar to li_b items, the first character of li_a joins with next character, and redo the process until it reaches to its longest match. Then, the longest term should be split, and the process will continue till the end. As the unknown characters and words of li_a which do not appear in li_b will be appended as they were before.

The final work should be like this:

new_li = ['These','45','are','cars']

The attempt getting so far, but this works for two strings not for Lists, and it doesn't retrieve unidentified words.

def longest_matched_substring(s1, s2):
    m = [[0] * (1 + len(s2)) for i in xrange(1 + len(s1))]
    longest, x_longest = 0, 0
    for x in xrange(1, 1 + len(s1)):
       for y in xrange(1, 1 + len(s2)):
           if s1[x - 1] == s2[y - 1]:
               m[x][y] = m[x - 1][y - 1] + 1
               if m[x][y] > longest:
                   longest = m[x][y]
                   x_longest = x
           else:
               m[x][y] = 0

    return s1[x_longest - longest: x_longest]
bunji
  • 5,063
  • 1
  • 17
  • 36
Htet
  • 159
  • 10

1 Answers1

2

You can do it using two for loops and a temp variable as the following:

def longest_matched_substring(li1, li2):
    new_li = []
    tmp = ''
    for a in li1:
        tmp += a
        count = 0
        for b in li2:
            if tmp == b:
                count += 1
        if count == 0:
            tmp1 = tmp.replace(a, '')
            new_li.append(tmp1)
            tmp = a
    if li2.__contains__(tmp):
        new_li.append(tmp) 
    return new_li

INPUT:

li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
li_b = ['T','Th','The','Thes','These','a','ar','are','c','ca','car','cars']
print longest_matched_substring(li_a, li_b)

OUTPUT:

['These', '45', 'are', 'cars']

as for the new scenario, you can modify the function as the following:

def longest_matched_substring(li1, li2):
    new_li = []
    tmp = ''
    for a in li1:
        tmp += a
        count = 0
        for b in li2:
            if tmp == b:
                count += 1
        if count == 0:
            tmp1 = tmp.replace(a, '')

            new_li.append(tmp1)
            tmp = a
    if li_b.__contains__(tmp):
        new_li.append(tmp) 
    for e1 in new_li:
        tmp2 = e1
        rm = []
        for e2 in new_li:
            if e1 != e2:
                tmp2 += e2
                rm.append(e2)
                if tmp2 in li2:
                    new_li.insert(new_li.index(e1), tmp2) # if order matters
                    #new_li.append(tmp2) if order doesn't matter
                    for r in rm:
                        new_li.remove(r)
                    new_li.remove(e1)
                    rm = []
                    break
    return new_li

INPUT:

li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
li_b = ['T','Th','The','These','a','ar','are','c','ca','car','cars']
print longest_matched_substring(li_a, li_b)

OUTPUT:

['These', '45', 'are', 'cars']
Mohd
  • 5,523
  • 7
  • 19
  • 30
  • Thanks for your response and it's exactly what I was expecting, but the result could go wrong if one of the sequence characters misses out from the `li_b`. For example, `li_b = ['T','Th','The','These','a','ar','are','c','ca','car','cars']` , where you can see `Thes` is missing out. And the result becomes as `['The', 's', 'e', '45', 'are', 'cars']`. For this problem, any suggestion to tweak it to obtain `['These', '45', 'are', 'cars']` ? – Htet May 20 '17 at 08:14
  • do you want to keep the order of new_li? or is it ok if the order changes? – Mohd May 20 '17 at 14:49
  • @htetmyet I just edited the answer and included the code for the scenario you just mentioned – Mohd May 20 '17 at 15:43
  • Your code replies the right result, thanks for the effective response. You just saved my day. Thanks. – Htet May 23 '17 at 01:23