3

I have this python code for finding the longest substring. I'm trying to figure out the asymptotic run time of it and I've arrived at an answer but I'm not sure if it's correct. Here is the code:

def longest_substring(s, t):
    best = ' '
    for s_start in range(0, len(s)):
        for s_end in range(s_start, len(s)+1):
            for t_start in range(0, len(t)):
                for t_end in range(t_start, len(t)+1):
                    if s[s_start:s_end] == t[t_start:t_end]:
                        current = s[s_start:s_end]
                            if len(current) > len(best):
                                best = current
    return best

Obviously this function has a very slow run time. It was designed that way. My approach was that because there is a for loop with 3 more nested for-loops, the run-time is something like O(n^4). I am not sure if this is correct due to not every loop iterating over the input size. Also, it is to be assumed that s = t = n(input size). Any ideas?

donut juice
  • 257
  • 6
  • 19
  • O(n^4) looks correct - remember big O is about worst case complexity for very large inputs. – Tom Dalton Feb 18 '16 at 00:57
  • @TomDalton Thanks for the answer. In the 1st nested for loop, is the runtime just O(n)? Or does the varying ranges make it something else? – donut juice Feb 18 '16 at 01:00
  • Note that the loop for `t_end` is redundant; `t_end = t_start + s_end - s_start` or your substrings will be of different lengths, making `s[s_start:s_end] == t[t_start:t_end]` impossible. – Hugh Bothwell Feb 18 '16 at 02:30

1 Answers1

2

If you're not convinced that it's O(n^5), try calculating how many loops you run through for string s alone (i.e. the outer two loops). When s_start == 0, the inner loop runs n + 1 times; when s_start == 1, the inner loop runs n times, and so on, until s_start = n - 1, for which the inner loop runs twice.

The sum

(n + 1) + (n) + (n - 1) + ... + 2

is an arithmetic series for which the formula is

((n + 1) + 2) * n / 2

which is O(n^2).

An additional n factor comes from s[s_start:s_end] == t[t_start:t_end], which is O(n).

univerio
  • 19,548
  • 3
  • 66
  • 68
  • This was a great answer. I tried calculating it with the arithmatic series but ended up confusing myself more. Thanks for the clarification. – donut juice Feb 18 '16 at 01:09
  • 3
    I think you're missing a factor of n. The comparison `s[s_start:s_end] == t[t_start:t_end]` is an O(n) operation, too: (in the worst case) it has to crank along the two sequences comparing them element-wise. That makes the complexity O(n^5), not O(n^4). – Benjamin Hodgson Feb 18 '16 at 01:13
  • @BenjaminHodgson That's a good point. I've updated my answer. – univerio Feb 18 '16 at 01:28
  • 1
    @donutjuice Please see the comments and the updated answer. – univerio Feb 18 '16 at 01:29