0

I am having trouble understanding why the solution of Leetcode's Repeated String match only goes up to q + 1 repeats of A (if A.length() < B.length()) for B to be a possible substring of A repeated.

I read other StackOverflow solutions as well as Leetcode discussion pages but I am still unable to fully understand the solution.

The algorithm is explained as:

Imagine we wrote S = A+A+A+.... If B is to be a substring of S, we 
only need to check whether some S[0:], S[1:], ..., S[len(A) - 1:] 
starts with B, as S is long enough to contain B, and S has period 
at most len(A).

Now, suppose q is the least number for which len(B) <= len(A * q). 
We only need to check whether B is a substring of A * q or A * 
(q+1). If we try k < q, then B has larger length than A * q and 
therefore can't be a substring. When k = q+1, A * k is already big 
enough to try all positions for B; namely, A[i:i+len(B)] == B for i 
= 0, 1, ..., len(A) - 1.

The implementation is as follows:

class Solution {
  public int repeatedStringMatch(String A, String B) {
      int q = 1;
      StringBuilder S = new StringBuilder(A);
      for (; S.length() < B.length(); q++) S.append(A);
      if (S.indexOf(B) >= 0) return q;
      if (S.append(A).indexOf(B) >= 0) return q+1;
      return -1;
  }
}

I understand that when A.length() < B.length(), B cannot be a substring, so we would need to keep appending A until A.length() is at least equal to B.length(). But once this is the case, why is it that we would only need to add one more copy of A to get the minimum number of repeats?

My intuition is that after A is repeated some number of times there is a pattern that is established and if B does not fall into that pattern/sequence of characters then no matter how many times you repeat A, B will not be a substring of the repeated A.

However, I just don't know why it has to be specifically the number of copies to match B's length or 1 more copy added after A.length() = B.length().

If someone could clear up this confusion for me, it would be much appreciated. Thank you.

ambition
  • 1
  • 2

1 Answers1

0

I think you have pretty much understand most of it, so lets you go with another basic example.

ampleex
itsalreadyhereexample
exam

Lets say B is example and A is one of the above.


For the first case A.length() == B.length()
we check whether it is a substring and get no as an answer.
So we add it once more and get 'ampleexampleex'
and now we get the result that A contains B.


For the second case A.length() > B.length()
we check whether it is a substring and get the result that A contains B.

(If it would not be here we still wound need check if its in the repeated from,
which is equivalent to the first case)


For the third case A.length() < B.length
so we repeat it till we cover the length of B
and get examexam.

We see that its not in there, so we add it once more,
and its still not in there (examexamexam).


The reason we need to do that, is because it could be a more special case.
B coud be somthing like xamexame - basically a repetition of one of As variations.

(Possible variations in this case would be repetitions of xame, amex, mexa.)

In this case it must be in a repeated form that is longer than B, which is where the q+1 comes from.

Lets look at the repetions in more detail:
B`s length can be at most (A.length()*q)+x, where x is [0, A.length].

A = exam
B = xame[xame]

B is still a repition of A, but every character in the last repition is optional.

examexam
 xame
 xamex
 xamexa
 xamexam

examexamexam
 xamexame

Adding another exam to S, won`t change anything since we covered all possibities allready (no new pattern will appear from now on).

If its not in there it can`t be a repition. The other scenarions - where it could be a substring - have been covered by the first and second case.


I hope going through this example helps you clear your confusion. If not just ask what point you don`t understand.

second
  • 4,069
  • 2
  • 9
  • 24
  • Thanks for the clarification. I get the special case where the beginning of A could be appended to the end of A and make B; however, my main point of confusion is why are we allowed to stop at q+1? What is it about going past q+1 that causes the redundancy? And how would I intuit that going forward? – ambition Jun 29 '19 at 23:12
  • I assume you don`t have a problem with case 1 & 2, so I only added more details for the 3rd case. – second Jun 30 '19 at 01:56