4

PS: This is not a duplicate of How to find the overlap between 2 sequences, and return it

[Although I ask for solutions in above approach if it could be applied to the following problem]

Q: Although I got it right, it is still not a scalable solution and is definitely not optimized (low on score). Read the following description of the problem and kindly offer better solution.

Question:

For simplicity, we require prefixes and suffixes to be non-empty and shorter than the whole string S. A border of a string S is any string that is both a prefix and a suffix. For example, "cut" is a border of a string "cutletcut", and a string "barbararhubarb" has two borders: "b" and "barb".

class Solution { public int solution(String S); }

that, given a string S consisting of N characters, returns the length of its longest border that has at least three non-overlapping occurrences in the given string. If there is no such border in S, the function should return 0.

For example,

  • if S = "barbararhubarb" the function should return 1, as explained above;
  • if S = "ababab" the function should return 2, as "ab" and "abab" are both borders of S, but only "ab" has three non-overlapping occurrences;
  • if S = "baaab" the function should return 0, as its only border "b" occurs only twice.

Assume that:

  • N is an integer within the range [0..1,000,000];
  • string S consists only of lower-case letters (a−z).

Complexity:

  • expected worst-case time complexity is O(N);
  • expected worst-case space complexity is O(N) (not counting the storage required for input arguments).

def solution(S):
    S = S.lower()
    presuf = []
    f = l = str()
    rank = []
    wordlen = len(S)
    for i, j in enumerate(S):
        y = -i-1
        f += S[i]
        l = S[y] + l
        if f==l and f != S:
            #print f,l
            new=S[i+1:-i-1]
            mindex = new.find(f)
            if mindex != -1:
                mid = f #new[mindex]
                #print mid
            else:
                mid = None
            presuf.append((f,mid,l,(i,y)))
    #print presuf
    for i,j,k,o in presuf:
        if o[0]<wordlen+o[-1]: #non overlapping
            if i==j:
                rank.append(len(i))
            else:
                rank.append(0)
    if len(rank)==0:
        return 0
    else:
        return max(rank)

My solutions time complexity is: O(N2) or O(N4) Help greatly appreciated.

Community
  • 1
  • 1
user2290820
  • 2,709
  • 5
  • 34
  • 62

5 Answers5

1

My solution is combination between Rabin-Karp and Knuth–Morris–Pratt algorithms. http://codility.com/cert/view/certB6J4FV-W89WX4ZABTDRVAG6/details

0

I have a (Java) solution that performs O(N) or O(N**3), for a resulting 90/100 overall, but I can't figure out how to make it go though 2 different testcases:

almost_all_same_letters aaaaa...aa??aaaa??....aaaaaaa 2.150 s. TIMEOUT ERROR running time: >2.15 sec., time limit: 1.20 sec.

same_letters_on_both_ends 2.120 s. TIMEOUT ERROR running time: >2.12 sec., time limit: 1.24 sec.

Edit: Nailed it! Now I have a solution that perform in O(N) and passes all the checks for a 100/100 result :) I didn't know Codility, but it's a nice tool!

  • what is the actual string? 'aaaaa...aa??aaaa' ?? – user2290820 Jul 09 '13 at 12:53
  • Yes. Now I changed approach, and my solution has complexity O(N), points 95/100.. but still... 1 of the timeouts miss to complete in time: single_letter_with_some_tweaks 2.150 s. TIMEOUT ERROR running time: >2.15 sec., time limit: 1.32 sec. – Vincenzo Melandri Jul 10 '13 at 07:56
  • In that string the ".." in "aaaaa..aaa" probably means "an awful lot of As". – Vincenzo Melandri Jul 10 '13 at 08:08
  • its always better to display the code that you are talking about. – user2290820 Jul 11 '13 at 20:06
  • I don't think this is the case (to show the code I mean), don't want to spoil the solution for who will take the challenge! :) By the way, the brute force approach of the code on your post will never perform better than 90/100 on the codility challenge :) – Vincenzo Melandri Jul 12 '13 at 02:09
0

I have a solution with suffix arrays (there actually is algorithm for constructing SA and LCP in linear time or something bit worse than that, but surely not quadratic).

Still not sure if I can go without RMQs ( O(log n) with SegmentTree) which I couldn't make pass my own cases and seems quite complicated, but with RMQs it can (not mentioning approach with for loop instead of RMQ, that would make it quadratic anyway).

Solution is performing quite fast and passing my 21 test cases with various perks I've managed to craft, but still failing on some of their cases. Not sure if that helped you or gave you idea how to approach the problem, but I am sure that naive solution, like @Vicenco said in some of his comments, can't get you better than Silver.

EDIT: managed to fix it all problems, but still to slow. I had to enforce some conditions but had to increase complexity with this, still not sure how to optimize that. Will keep you posted. Good luck!

Ivor Prebeg
  • 988
  • 6
  • 11
0
protected int calcBorder(String input) {
    if (null != input) {
        int mean = (input.length() / 3);
        while (mean >= 1) {
            if (input.substring(0, mean).equals(
                    input.substring(input.length() - mean))) {
                String reference = input.substring(0, mean);
                String temp = input
                        .substring(mean, (input.length() - mean));
                int startIndex = 0;
                int endIndex = mean;
                int count = 2;
                while (endIndex <= temp.length()) {
                    if (reference.equals(temp.substring(startIndex,
                            endIndex))) {
                        count++;
                        if (count >= 3) {
                            return reference.length();
                        }
                    }
                    startIndex++;
                    endIndex++;
                }
            }
            mean--;
        }
    }
    return 0;
}
Binesh
  • 1
0

The Z-Algorithm would be a good solution.

Will
  • 24,082
  • 14
  • 97
  • 108
Mitesh Joshi
  • 177
  • 1
  • 9