0

This code (adapted from a Prefix-Suffix code) is quite slow for larger corpora:

s1 = 'gafdggeg'
s2 = 'adagafrd'

Output: gaf

def pref_also_substr(s):
    n = len(s)
    for res in range(n, 0, -1):
        prefix = s[0: res]
        if (prefix in s1):
            return res

    # if no prefix and string2 match occurs

    return 0

Any option for an efficient alternative?

martineau
  • 119,623
  • 25
  • 170
  • 301
JohnGS
  • 1
  • 1
    If the dominant size of the problem is the length of `s2`, and `s1` is relatively small, you can solve this problem efficiently using regex search, by building a regex from `s1` such that it would match any prefix of `s1`. If you are interested, I can show you how to do this. If the length of `s1` is only bound by `O(len(s2))`, however, then this approach is not optimal, and you will need to manually code. Anyway, the problem can be solved in linear time (in `len(S2)`), which is much better than your current solution. – Amitai Irron May 14 '20 at 22:45
  • 1
    This looks like a simpler (or restricted) version of the [Longest common substring problem](https://en.wikipedia.org/wiki/Longest_common_substring_problem), for which exist efficient suffix-tree based solutions. – amain May 14 '20 at 23:08
  • What is the minimum length of the substring? – martineau May 14 '20 at 23:25

2 Answers2

0

I have another approach to solve this question. First you can find all substrings of s2 and replace the key in dictionary d with highest size.

s2 = "'adagafrd'"
# Get all substrings of string 
# Using list comprehension + string slicing 
substrings = [test_str[i: j] for i in range(len(test_str))
       for j in range(i + 1, len(test_str) + 1)]

Now you can use startswith() function to check longest prefix from this list of substring and compare the size of substring.

s1 = 'gafdggeg'
d={}
for substring in substrings:
    if s1.startswith(substring):
        if not d:
            d[substring]=len(substring)
        else:
            if len(substring)>list(d.values())[0]:
                d={}
                d[substring]=len(substring)
print(d)

Output:

{'gaf': 3}
0
def f(s1, s2):
    for i in range(len(s1)):
        i += 1
        p = s1[:i]
        if p in s2:
            s2 = s2[s2.index(p):]
        else: 
            return i - 1

Check the prefixes starting from length 1. If find a prefix, discard the chars behind the prefix founded and continue searching.

angeldeluz777
  • 319
  • 3
  • 4