Find all the common substrings between two strings, regardless of case and order

Question

So i started with the code from an answer to this question Function to find all common substrings in two strings not giving correct output and modified it a little bit to accommodate case-independence (i.e. AbCd is the same as ABCD as Abcd and so on) by turning the string to lowercase. However, for strings like 'ABCDXGHIJ' and 'ghijYAbCd', it only returns ['ghij'], not the desired output ['ABCD', 'GHIJ'].

Here are other examples:

'Bonywasawarrior' and 'Bonywasxwarrior' (output: ['Bonywas', 'warrior', 'wa'], desired output: ['Bonywas', 'warrior'])
'01101001' and '101010' (output: ['1010', '0', '1010', '01', '10', '01'], desired output: ['1010'])

here is my code:

t = int(input()) #t cases

while t > 0:
    A = str(input()) #1st string
    B = str(input()) #2nd string

    low_A = A.lower()
    low_B = B.lower()

    answer = ""
    anslist=[]
    for i in range(len(A)):
        common = ""
        for j in range(len(B)):
            if (i + j < len(A) and low_A[i + j] == low_B[j]):
                common += B[j]
            else:
                #if (len(common) > len(answer)): 
                answer = common
                if answer != '' and len(answer) > 1:
                    anslist.append(answer)
                common = ""

        if common != '':
            anslist.append(common)

    if len(anslist) == 0:
        print('[]') #print if no common substring
    else:
        print(anslist)
    t -= 1

You are not checking the first A from "ABCDXGHIJ" ever, you always check low_A[i+j] and to match the "abcd", you will be checking low_A[0+5] and so low_A[0] will always be compared with "g" (low_A[0+0]). — SajanGohil, Oct 05 '19 at 19:43

blhsing · Answer 1 · 2019-10-05T22:11:36.633

You can increment an offset in a while loop to keep concatenating common characters at the offset from the respective indices until they become different instead. To find the longest, non-overlapping common substrings, you can use a function that recursively traverses different paths of substring partitioning, and returns the one with the longest lengths of substrings:

def common_strings(a, b, i=0, j=0):
    candidates = []
    len_a = len(a)
    len_b = len(b)
    if j == len_b:
        candidates.append(common_strings(a, b, i + 1, 0))
    elif i < len_a:
        offset = 0
        while i + offset < len_a and j + offset < len_b and a[i + offset].lower() == b[j + offset].lower():
            offset += 1
        if offset > 1:
            candidates.append([a[i: i + offset]] + common_strings(a, b, i + offset, j + offset))
        candidates.append(common_strings(a, b, i, j + 1))
    return candidates and max(candidates, key=lambda t: sorted(map(len, t), reverse=True))

so that:

print(common_strings('ABCDXGHIJ', 'ghijYAbCd'))
print(common_strings('Bonywasawarrior', 'Bonywasxwarrior'))
print(common_strings('01101001', '101010'))

outputs:

['ABCD', 'GHIJ']
['Bonywas', 'warrior']
['1010']

doesn't work for: 'Bonywasawarrior' , 'Bonywasxwarrior' output: ['Bonywas', 'wa', 'rrior'] and '01101001', '101010' output: ['01', '1010', '01'] — Eugene, Oct 05 '19 at 16:24
@Eugene: If you're looking for the longest matching substring, you're going to need to think of a completely different way to go about this. — Bill Lynch, Oct 05 '19 at 17:31

score 0 · Answer 2 · answered Oct 05 '19 at 19:11

This is a duplicate of Finding all the common substrings of given two strings, which offers a solution in Java and for which I have done my best to translate to Python with the "enhancement" of making it case-insensitive:

def find_common(s, t):
    table = [len(t)*[0] for i in range(len(s))]
    longest = 0
    result = set()
    for i, ch1 in enumerate(s.lower()):
        for j, ch2 in enumerate(t.lower()):
            if ch1 != ch2:
                continue
            table[i][j] = 1 if i == 0 or j == 0 else 1 + table[i - 1][j - 1]
            if table[i][j] > longest:
                longest = table[i][j]
                result.clear()
            if table[i][j] == longest:
                result.add(s[i - longest + 1:i + 1]);
    return result


print(find_common('Bonywasawarrior', 'Bonywasxwarrior'))
print(find_common('01101001', '101010'))
print(find_common('ABCDXGHIJ', 'ghijYAbCd'))

Prints:

{'Bonywas', 'warrior'}
{'1010'}
{'GHIJ', 'ABCD'}

Find all the common substrings between two strings, regardless of case and order

2 Answers2