-1

I am trying to find the number of prefixes equal to the suffixes and its length in a string of length n. They can overlap for eg if the string is "abacaba" then ans is {1, 3, 7} prefix of length 1 (a), 3 is "aba" and the whole string. The prefix "a" is equal to the suffix "a". prefix "aba" is equal to suffix "aba". the whole string is equal to the suffix. If the string is "aaaaa" then the answer is {1, 2, 3, 4, 5}. "a", "aa", "aaa", "aaaa", "aaaaa".

I can get only in O(n2) in which we take every prefix and compare with the same length suffix. But is there a better algorithm to solve this? Thanks in advance

Priyantha
  • 4,839
  • 6
  • 26
  • 46
  • If you post the algorithm you have currently implemented it would be easier for people to help you – oyvindhauge Nov 10 '17 at 14:37
  • @AJAY, if it's a problem from any online judge...can you plz share the link? – Shihab Shahriar Khan Nov 10 '17 at 17:16
  • This is not a full problem. This is a part of the problem in code forces. "http://codeforces.com/problemset/problem/432/D". Here in this problem if i could find the length of all the prefixes which matches with the suffix, Then i could construct 'z' table using z algorithm because z algorithm gives the logest matching substring prefix at the ith position. Then i could add every longest prefix length in the z algorithm to the strings of smaller lengths and there fore print the result – AJAY HAYAGREEVE Nov 10 '17 at 17:23
  • I strongly suspect you can reuse the KMP failure table for that. I'm working on a solution to that right now – Shihab Shahriar Khan Nov 10 '17 at 17:45

2 Answers2

2

hashing can help here

define the hash function of the string a1a2a3a4 as (a1 * 26^3 + a2 * 26^2 + a3 * a6^1 + a4 * 26^0) % M where M is a large prime number

Now keep two pointers one at the start and one at the end. move the start pointer forward on every iteration and calculate the hash of the prefix up to start and move the end pointer backwards on every iteration and calculate the hash of the suffix, if the hash is equal the strings are equal.

hash_st = 0
hash_ed = 0
st = 0
ed = len(s)-1
while st ! = len(s) - 1:
    hash_st = (hash_st*(26) + ascii_val(s[st])) % M
    hash_ed = (ascii_val(s[ed]) * (26^st) + hash_ed) % M
    if hash_st == hash_ed:
        add_to_result(st)
sukunrt
  • 1,523
  • 10
  • 20
  • What about collisions? Theoretically, this algorithm will fail if two different substrings hash to the same number, right? – oyvindhauge Nov 10 '17 at 14:31
  • yeah that's true. in practice you can minimise this using 2 hash functions and checking for equality in both. let me think of something which doesn't suffer from this – sukunrt Nov 10 '17 at 14:34
  • "let me think of something which doesn't suffer from this". Ok. – AJAY HAYAGREEVE Nov 10 '17 at 14:38
  • @AJAYHAYAGREEVE Considering this is an obvious homework assignment, the least you can do is post implementation of what you have already tried. Having someone just give you the answer won't help you. – oyvindhauge Nov 10 '17 at 14:56
  • I solved using the common O(n2) algo. i.e for i = 1 to n { int j = n-i+1; bool p = true; for k = 1 to i { if(s[k] != s[j]) p = false; j++; } if(p) add(i); } I thought for about 2 hours. But i could not think any other solution. But he gave me an idea to use hashing. It worked but i doubt that it will work for large strings bcoz of collision. – AJAY HAYAGREEVE Nov 10 '17 at 15:10
  • This is a good answer. Although not theoretically sound, approach like this is very common in many real-world problems. – Shihab Shahriar Khan Nov 10 '17 at 15:40
2

My approach takes O(N) time to pre-process the string, then O(|ans array|) to compute the answer.

The pre-process is basically KMP failure table building part, on the entire string except last character. ("abacab" in your example). In the table returned before, the value for last index in given string (i.e 5 or 'b') will be two. This means the maximum prefix that matches with AND ends in 'b' is 2. Now if your last character matches with 3rd char of prefix ('a'), you have got a suffix equal to prefix. ("aba")

KMP stops right there. But you want all the matches. So instead of Maximum match that ends in last char (2 in 'b'), you need to find ALL the matches with prefix that ends in 'b'. So you keep going in KMP's inner loop, and like above, check for current amount of match that ends in 'b' (which can be zero), if next char equals our last char.

def build_table(pat):
    table = [0]*len(pat)
    p = 0
    for i in range(1,len(pat)):
        while p>0 and pat[p]!=pat[i]: #KMP's loop i was talking about
            p = table[p-1]

        if pat[p]==pat[i]:
            table[i] = p+1
            p+=1

    return table

pat = "abracadabab"
table = build_table(pat[:-1]) #build table for "abracadaba", i.e except last 

ans = [] #to store answers
p = len(pat)-1 #last index of table building string i.e 5 or 'b'
while p>0: #the main loop
    p = table[p-1] 
    print(p)
    if pat[p]==pat[-1]:
        ans.append(p+1)

print(ans)

which for "abacab" prints [1,3], for "abracadabra" it's [1,4]. Treat entire length as special case.

(Note the similarity between my while loop and KMP's loop. If you are still confused, I strong suggest to thoroughly read/understand KMP. It's easy to get a overall idea about that, but deeply understanding is really hard and crucial to answering questions like this.)

Shihab Shahriar Khan
  • 4,930
  • 1
  • 18
  • 26