2

Here's the Python code:

def is_palindrome(s):
    return s == s[::-1]


def longestp(s):
    if is_palindrome(s):
        return s

    maxp = s[0]

    for i in range(len(s)-1):
        half_length = len(maxp) // 2
        start = i - half_length
        end = i + half_length

        while start >= 0 and end <= len(s)-1:
            if is_palindrome(s[start:end+2]):
                if len(s[start:end+2]) > len(maxp):
                    maxp = s[start:end+2]
                end += 1
            elif is_palindrome(s[start:end+1]):
                if len(s[start:end+1]) > len(maxp):
                    maxp = s[start:end+1]
                start -= 1
                end += 1
            else:
                break

    return maxp

I initially thought it was O(n^3) because of two nested loops and string slicing, but it turned out to be nearly linear in my tests. Is there any kind of input for which this algorithm is going to be slow?

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
  • "it turned out to be nearly linear in my tests": slicing adds complexity, but it's not as slow as a python loop. how did you measure the "linear" side? – Jean-François Fabre Mar 30 '18 at 21:08
  • aside: I'm sure something faster than `len(s[start:end+1])` can be done. slicing a string just to compute the length is just sub-optimal. I think you'd be better off by rewriting `is_palindrome` with start & end parameters, and no slicing at all in all your program. That's probably be much faster (and less complex :)) – Jean-François Fabre Mar 30 '18 at 21:09
  • `how did you measure the "linear" side?` - run `%timeit` with 100-, 1k-, 10k-character input strings (random and palindromic). – Eugene Yarmash Mar 30 '18 at 21:11
  • @EugeneYarmash what were the "strings"? How did you ensure that the 10k-strings contain palindromic substrings of varying length? – Andrey Tyukin Mar 30 '18 at 21:13
  • You probably timed it with palindromes, which hit the `if is_palindrome(s):` case. – user2357112 Mar 30 '18 at 21:13

2 Answers2

3

The algorithm looks as if it needs total time proportional to

integral_0^N x dx = [(x^2)/2]_0^N = (N^2)/2 = O(N^2)

The strings matching ab* should give the worst case behavior.

Here is a piece of code that kind-of demonstrates the worst case behavior experimentally.

The structure is as follows:

  1. Define worstCase function that constructs "bad" strings of length N
  2. Measure time of your function on these strings
  3. Create dataset of log(N) vs. log(time(N))
  4. Fit a line, try to estimate the slope of the line: this is the exponent p in your O(N^p).

Here is the code:

def worstCase(length):
  return "a" + "b" * (length - 1)

from time import clock
from math import log

xs = []
ys = []
for n in [4 * int(1000 * 1.2 ** n) for n in range(1, 20)]:
  s = worstCase(n)
  assert len(s) == n
  startTime = clock()
  p = longestp(s)
  endTime = clock()
  assert p == s[1:]
  t = endTime - startTime
  xs.append(log(n))
  ys.append(log(t))
  print("%d -> %f" % (n, endTime - startTime))

from numpy import polyfit

exponent, constant = polyfit(xs, ys, 1)

print("Exponent was: %f" % (exponent))

Here is the output (takes a minute or two):

4800 -> 0.057818
5760 -> 0.078123
6908 -> 0.105169
8292 -> 0.145572
9952 -> 0.197657
11940 -> 0.276103
14332 -> 0.382668
17196 -> 0.534682
20636 -> 0.747468
24764 -> 1.048267
29720 -> 1.475469
35664 -> 2.081608
42796 -> 2.939904
51356 -> 4.216063
61628 -> 5.963550
73952 -> 8.691849
88744 -> 12.126039
106492 -> 19.684188
127788 -> 24.942766
Exponent was: 1.867208    

It estimates the exponent to be ~1.86, which is closer to 2 than to 3.

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
2

It's definitely not linear. Try it with input that contains a lot of palindromes, but isn't a palindrome:

>>> timeit.timeit('longestp(x)', 'x="a"*100000+"b"', globals=globals(), number=1)
5.5123205203562975
>>> timeit.timeit('longestp(x)', 'x="a"*10000+"b"', globals=globals(), number=1)
0.08460151217877865

Slicing and s == s[::-1] have a much better constant factor than interpreted Python code, and you need to make sure the inner loop isn't breaking early. Those effects may have thrown off your attempt to judge time complexity by timing.


I don't think it's O(n^3) either. The nested loops don't interact the way you might intuitively expect, because of the break condition. The inner loop executes O(n) iterations over the whole course of the algorithm, because after a bounded number of iterations, either len(maxp) grows, or the loop breaks. This algorithm looks worst-case O(n^2) to me.

user2357112
  • 260,549
  • 28
  • 431
  • 505