I guess you want the longest substring possible that follows this pattern.
The first thing to do is to build a Suffix tree of the input string. Using Ukkonen's algorithm, this is O(n).
Now, how does the condition you provided translate in the suffix tree? First things first, you are looking for a repeated substring [1]. Repeated substrings will appear as internal nodes of the suffix tree. The maximum number of nodes in a suffix tree built from a n-char string is 2n - 1.
You can build a Max-Heap containing such repeated substrings, using their length (number of chars). You do NOT keep substrings of length superior to N/2 (see [1]). This is O(N) where N is the number of internal nodes of the suffix tree. For any suffix tree:
0 ≤ N ≤ n - 2
Now, you take the max out of the priority queue and process the internal node i you obtained:
- Let Si be the substring related to i, k = 0 and curnode = i
- While k < length(Si)
- If the key from i to a child of i is equal to Si[k], then k = k+1
- Else break the loop.
- If k == length(Si), then the substring is a match. Else, you proceed to the next substring.
Complexity summary
Let n be the length of the query string.
- Building the suffix tree : O(n)
- Building the Max-heap of repeated substrings: [3]
- Identifying the repeated substrings (ie. internal nodes) and storing them in an array: O(n)
- Heapify the array: O(n)
- Finding the best match: O(n².log(n)) [2]
Hence the overall worst case complexity is the sum of the above and is O(n².log(n)).
Notes
I made the algorithm above... Hence it is suboptimal, if you are brave enough, you can go through this paper that describes a linear time algorithm! In any case, Suffix trees are a key to this problem so I suggest you study them thoroughly.
[1]: Warning, repeated substrings may partially overlap!
[2]: Actually, the worst case complexity is better than this very naive upper bound but I don't know how to prove it (yet?!). For example, if there were n - 2 internal nodes, that would mean that the original string consists of n occurrences of the same character. In that case, the first substring we check is a match => it's O(n.log(n)).
[3]: If we replace the heap construction by a regular sort (O(n.log(n))), the final comparison step runs in O(n²) instead of O(n².log(n)) ... Taking down the overall complexity between O(n.log(n)) (due to the sorting step) and O(n²).