Function to find all common substrings in two strings not giving correct output

Question

I am using the following function to find all the common substrings between two strings:

def substringFinder(string1, string2):
    answer = ""
    anslist=[]
    len1, len2 = len(string1), len(string2)
    for i in range(len1):
        match = ""
        for j in range(len2):
            if (i + j < len1 and string1[i + j] == string2[j]):
                match += string2[j]
                j=j+1
            else:
                #if (len(match) > len(answer)): 
                answer = match
                if answer != '':
                    anslist.append(answer)
                match = ""

        if match != '':
            anslist.append(match)
        break
    print(anslist)

So when I do substringFinder("ALISSA", "ALYSSA") is gives ['AL', 'SSA'] which is fine. But when I do substringFinder("AHAMMAD", "AHAMAD"), it only gives output ['AHAM'] but I want ['AHAM', 'MAD'] as output. How to get that?

why `['AHAM', 'AD']` and not `['AHAM', 'MAD']` ? – Nir Alfasi Aug 15 '17 at 22:44 — Nir Alfasi, Aug 15 '17 at 22:44
also, why did you add the `break` ? – Nir Alfasi Aug 15 '17 at 22:45 — Nir Alfasi, Aug 15 '17 at 22:45

score 1 · Answer 1 · answered Aug 15 '17 at 23:26

1

You can try this:

def substrings(s1, s2):
    final = [s1[i:b+1] for i in range(len(s1)) for b in range(len(s1))]


    return [i for i in final if i in s1 and i in s2 and len(i) > 1]

s1, s2 = "ALISSA", "ALYSSA"


print(substrings(s1, s2))

Output:

['AL', 'SS', 'SSA', 'SA']

answered Aug 15 '17 at 23:26

Ajax1234

69,937
8
61
102

So for ALISSA & ALYSSA, as I mentioned, I want ['AL', 'SSA']. i.e, find the longest match for each part where a non-match character ('I' & 'Y') occurs. And for AHAMMAD & AHAMAD, I want ['AHAM', 'MAD'], non-match character is 'M' in the first string. – Rajiv Aug 16 '17 at 16:41

score 0 · Answer 2 · answered Aug 15 '17 at 23:00

0

Here is a straightforward brute-force solution:

In [7]: def substring_finder(s1, s2):
   ...:     matches = []
   ...:     size = len(s1)
   ...:     for i in range(2, size):
   ...:         for j in range(0, size, i):
   ...:             stop = j+i
   ...:             if stop > size:
   ...:                 continue
   ...:             sub = s1[j:stop]
   ...:             if sub in s2:
   ...:                 matches.append(sub)
   ...:     return matches
   ...:

In [8]: substring_finder("ALISSA", "ALYSSA")
Out[8]: ['AL', 'SA', 'SSA']

In [9]: substring_finder("AHAMMAD", "AHAMAD")
Out[9]: ['AH', 'AM', 'MA', 'AHA', 'AHAM']

answered Aug 15 '17 at 23:00

juanpa.arrivillaga

88,713
10
131
172

So for ALISSA & ALYSSA, as I mentioned, I want ['AL', 'SSA']. i.e, find the longest match for each part where a non-match character ('I' & 'Y') occurs. And for AHAMMAD & AHAMAD, I want ['AHAM', 'MAD'], non-match character is 'M' in the first string. – Rajiv Aug 16 '17 at 16:41
@Rajiv yes, I wasn't sure exactly what your requirements were. You could obviously filter the above to meet the requirements. – juanpa.arrivillaga Aug 16 '17 at 17:06

JacobIRR · Accepted Answer · 2017-08-15T23:11:27.513

Don't break
Check the length of the string before adding it to avoid results like "A"
return the function result instead of printing inside the function

Like this:

def substringFinder(string1, string2):
    answer = ""
    anslist=[]
    len1, len2 = len(string1), len(string2)
    for i in range(len1):
        match = ""
        for j in range(len2):
            if (i + j < len1 and string1[i + j] == string2[j]):
                match += string2[j]
            else:
                #if (len(match) > len(answer)): 
                answer = match
                if answer != '' and len(answer) > 1:
                    anslist.append(answer)
                match = ""

        if match != '':
            anslist.append(match)
        # break
    return anslist

print substringFinder("AHAMMAD", "AHAMAD")

result: ['AHAM', 'MAD']

Function to find all common substrings in two strings not giving correct output

3 Answers3

Linked