We can solve this in O(n * log(size_of_alphabet))
. Let f(i)
represent the most valid substrings ending at the i
th character. Then:
f(i) ->
1 + f(j - 1)
where j is the rightmost index smaller
than or equal to i where s[j..i] is a
valid substring and (j - 1) is inside
the current window. Call s[j..i] the
"minimal" valid substring ending at
index i.
An invariant for our window is that if a character is seen k + 1
times, we move the left bound just past that character's leftmost instance in the window. This guarantees that any two substrings in a string of concatenated, valid substrings in the current window cannot have a shared character, and thus remain a valid concatenation.
Each time we reach the k
th instance of character c
, the rightmost index smaller than or equal to i
where s[j..i]
is a valid substring must start to the right of all characters in the window who's count is less than k
. To find the rightmost such index, we may also need to move ahead of valid neighbouring substrings already seen in the window.
To find that index, we can maintain a max indexed-heap that stores the rightmost instance of each distinct character in our window currently with counts less than k
, prioritised by their index, such that our j
is always to the right of the heap's root (or the heap is empty). The heap is indexed, which alllows us to remove specific elements in O(log(size_of_alphabet))
.
We also keep the right and left boundary indexes of valid minimal substrings already seen in the window. We can use a double ended queue for that for O(1)
updates since a valid substring can appear to the right of another or envelope existing ones. And we keep a hashmap of the left boundaries for O(1)
lookup.
Additionally, we must keep a count of each distinct character in the window in order to maintain our invariant, no such count above k
, and their leftmost index in the window for the valid substring precondition.
Procedure:
for each index i in s:
let c be the character s[i]
if s[i] is the (k+1)th instance of c in the window:
move the left bound of the window
just past the leftmost instance of
c in the window, removing all
elements in the heap who's rightmost
instance we passed while updating
our window; and adding to the heap
the rightmost instance of characters
who's count has fallen below k
as we move the left bound of
the window. If the boundary moves
past the left bound of valid minimal
substrings, remove their boundaries
from the queue, and their left bound
from the hashmap.
if s[i] is the kth instance of c:
remove the previous instance of c
from the heap.
if the leftmost instance of c in the
window is to the right of the heap
root:
if (root_index + 1) is the
left bound of a valid minimal
substring in our queue:
we must be adding to the right
of all of them, so add a new
valid minimal substring, starting
at the next index after the
rightmost of those that ends
at i
otherwise:
add a new valid minimal substring,
starting at (root_index + 1)
and ending at i
otherwise:
remove the previous instance of c
in the heap and insert this one.
For example:
01234567
acbbaacc k = 2
0 a heap: (0 a)
1 c heap: (1 c) <- (0 a)
2 b heap: (2 b) <- (1 c) <- (0 a)
3 b kth instance, remove (2 b)
heap: (1 c) <- (0 a)
leftmost instance of b is to the
right of the heap root.
check root + 1 = 2, which points
to a new valid substring, add the
substring to the queue
queue: (2, 3)
result: 1 + 0 = 1
4 a kth instance, remove (0 a)
heap: (1 c)
queue: (2, 3)
result: 1
leftmost instance of a is left
of the heap root so continue
5 a (k+1)th instance, move left border
of the window to index 1
heap: (1 c)
queue: (2, 3)
result: 1
(5 a) is now the kth instance of
a and its leftmost instance is to
the right of the heap root.
check root + 1 = 2, which points
to a valid substring in the queue,
add new substring to queue
heap: (1 c)
queue: (2, 3) -> (4, 5)
result: 1 + 1 + 1 = 3
6 c kth instance, remove (1 c)
heap: empty
add new substring to queue
queue: (1) -> (2, 3) -> (4, 5) -> (6)
(for simplicity, the queue here
is not labeled; labels may be needed
for the split intervals)
result: 3 + 1 + 0 = 4
7 c (k+1)th instance, move left border
of the window to index 2, update queue
heap: empty
queue: (2, 3) -> (4, 5)
result: 4
(7 c) is now the kth instance of c
heap: empty
add new substring to queue
queue: (2, 3) -> (4, 5) -> (6, 7)
result: 4 + 1 + 2 = 7