0

I am trying to find an algorithm that culd return the length of the shortest cyclic sub string in a larger cyclic string.

A cyclic string would be defined as a concatenation of tow or more identicle strings, e.g. "abababab", or "aaaa"...

Now in a given for example a string T = "abbcabbcabbcabbc" there is a cycle of the pattern "abbc" but the shortest cyclic sub string would be "bb".

Mortalus
  • 10,574
  • 11
  • 67
  • 117

2 Answers2

1

If you're just looking for a substring that appears more than once:

Build a Suffix tree from the string.

While creating the suffix tree, you can count re-occurrences of every substring and save it on the number of occurrences on the node.

Then just do a BFS search on the tree (which will give you a layered search, from shorter to longer strings) and find the first substring which is longer than 1 that occurred more than once.

Total complexity: O(n) where n is the length of the string

Edit:

The paths from the root to the leaves have a one-to-one relationship with the suffixes of S

You can implement the tree that each node contains one letter, that will give you better granularity and allow you to see all the substrings by length.

Here's a suffix tree of banana where every node contains one letter, you can see that you have all the substrings there.
enter image description here

If you'll look at the applications section of the suffix tree, you'll see that it is used for exactly this kind of tasks - finding stuff about substrings.

Look at the image from the root, you can see ALL the substrings start from the root (BFS list):

b
a
n
ba
an
na
ban
ana
nan
bana
anan
nana
banan
anana
banana
Yochai Timmer
  • 48,127
  • 24
  • 147
  • 185
  • but the suffix tree won't give you all the available sub strings it will only give you all the suffixes of a specific string. hence if the shortest cyclic string is in the middle of the string e.g "affgaffg" you wont be able to find the ff sub string becuse it's not a suffix of the whole string. – Mortalus Jun 23 '11 at 20:04
  • yeah but again you get only the prefixes of a string how would a BFS provide me with the pattern in the middle of the string "AbsbsbsR" ? – Mortalus Jun 23 '11 at 20:16
  • no, you weren't right. it IS a suffix tree. I've had more time today to go over this again, edited the answer with an example. – Yochai Timmer Jun 24 '11 at 04:48
0

Let me call "abbc" the generator in your example - i.e. the string that you repeat in order to get the bigger string.

The very first observation is that the smaller string should be made by repeating some substring twice.

It's clear that the smallest string should be smaller than the generator repeated twice (2*generator), because 2*generator is cyclic.

Now note that you only need to consider the string obtained by taking the generator 3 times, when searching for smaller cyclic string. Indeed, if the smallest is not there, but it is in the 4*generator, then it must span at least two generators, but then it wouldn't be the smallest.

So now lets assume the bigger string is 3*generator (or 2*generator). Also it's clear that if the generator has only different digits, then the answer is 2*generator. If not then you just need to find all pairs of identical characters in the bigger string say at position i and j and check whether the string starting a i, which is 2*(j-i) long is cyclic. If you try them in order of increasing j-i, then you can stop after the first success.

Petar Ivanov
  • 91,536
  • 11
  • 82
  • 95
  • what about the give string "afgcfgcrafgcfgcr" note that the generator is "afgcfgcr" and the bigger string is 2*generator but the smallest cycle is "fgcfgc". – Mortalus Jun 23 '11 at 20:07
  • Or take of example the given string "abbbaabbba" the generator is "abba" but the smalest cycle is produced by the concat of 2 generator because there is a suffix that is equal to the prefix. – Mortalus Jun 23 '11 at 20:09
  • these two examples will be covered by my last paragraph, when you will try all pair of identical digits in order of how far they are from each other – Petar Ivanov Jun 23 '11 at 20:19