How do I check if a string is entirely made of the same substring?

Question

I have to create a function which takes a string, and it should return true or false based on whether the input consists of a repeated character sequence. The length of the given string is always greater than 1 and the character sequence must have at least one repetition.

"aa" // true(entirely contains two strings "a")
"aaa" //true(entirely contains three string "a")
"abcabcabc" //true(entirely containas three strings "abc")

"aba" //false(At least there should be two same substrings and nothing more)
"ababa" //false("ab" exists twice but "a" is extra so false)

I have created the below function:

function check(str){
  if(!(str.length && str.length - 1)) return false;
  let temp = '';
  for(let i = 0;i<=str.length/2;i++){
    temp += str[i]
    //console.log(str.replace(new RegExp(temp,"g"),''))
    if(!str.replace(new RegExp(temp,"g"),'')) return true;
  }
  return false;
}

console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false

Checking of this is part of the real problem. I can't afford a non-efficient solution like this. First of all, it's looping through half of the string.

The second problem is that it is using replace() in each loop which makes it slow. Is there a better solution regarding performance?

This link may be useful to you. I always find geekforgeeks as a good source for algorithm problems - https://www.geeksforgeeks.org/find-given-string-can-represented-substring-iterating-substring-n-times/ — Leron, Apr 24 '19 at 06:10
I think its a problem of checking whether a common substring occurs throughout uniformly — Kunal Mukherjee, Apr 24 '19 at 06:17
Do you mind if I borrow this and make it a coding challenge on the Programming Golf exchange site? — ouflak, Apr 24 '19 at 14:23
In case your curious, https://codegolf.stackexchange.com/questions/184682/check-if-a-string-is-entirely-made-of-the-same-substring — ouflak, Apr 24 '19 at 14:55
For a performance comparison with bigger test data, see https://jsperf.com/reegx-and-loop/14 — Axel Podehl, Apr 24 '19 at 16:59
You can give a try to [Neural Networks](https://en.wikipedia.org/wiki/Artificial_neural_network) if you accept a percentage of error on the predictions and you have a big set of data (input and output) to train the network. They are really fast to make the predictions after the training procedure. But there will be much time consumption on the training of the network (and you need the background knowledge). If you interested, there is a library for `Javascript`: [Brain.js](https://github.com/BrainJS/brain.js) — Shidersz, Apr 24 '19 at 17:22
@Shidersz Using Neural networks for this feels a bit like using a cannon to shoot a mosquito. — JAD, Apr 25 '19 at 09:30
Newer jsperf with 4 functions: https://jsperf.com/stackoverflow-question-55823298 — Salman A, Apr 25 '19 at 15:56
@JAD You could be right. However, I believe it would depend on the real aplication of what he needs, maybe his problem is more complex, not just this. Anyway, I was just giving another angle of vision that can help to solve recognition/clasifications problems. — Shidersz, Apr 26 '19 at 02:21
[Here](https://codegolf.stackexchange.com/questions/37851/string-prototype-isrepeated)'s another relevant PPCG question thread from 2014. In particular, the rotation and regex solutions there are practically the same as the ones independently discovered here. — MultiplyByZer0, Apr 27 '19 at 08:13

score 190 · Accepted Answer · edited Dec 23 '22 at 08:50

190

There’s a nifty little theorem about strings like these.

A string consists of the same pattern repeated multiple times if and only if the string is a nontrivial rotation of itself.

Here, a rotation means deleting some number of characters from the front of the string and moving them to the back. For example, the string hello could be rotated to form any of these strings:

hello (the trivial rotation)
elloh 
llohe 
lohel 
ohell

To see why this works, first, assume that a string consists of k repeated copies of a string w. Then deleting the first copy of the repeated pattern (w) from the front of the string and tacking it onto the back will give back the same string. The reverse direction is a bit trickier to prove, but the idea is that if you rotate a string and get back what you started with, you can apply that rotation repeatedly to tile the string with multiple copies of the same pattern (that pattern being the string you needed to move to the end to do the rotation).

Now the question is how to check whether this is the case. For that, there’s another beautiful theorem we can use:

If x and y are strings of the same length, then x is a rotation of y if and only if x is a substring of yy.

As an example, we can see that lohel is a rotation of hello as follows:

hellohello
   ^^^^^

In our case, we know that every string x will always be a substring of xx (it’ll appear twice, once at each copy of x). So basically we just need to check if our string x is a substring of xx without allowing it to match at the first or halfway character. Here’s a one-liner for that:

function check(str) {
    return (str + str).indexOf(str, 1) !== str.length;
}

Assuming indexOf is implemented using a fast string matching algorithm, this will run in time O(n), where n is the length of the input string.

edited Dec 23 '22 at 08:50

starball

20,030
7
43
238

answered Apr 25 '19 at 01:56

templatetypedef

362,284
104
897
1,065

13

Very nice! I've added it to the [jsPerf benchmark](https://jsperf.com/reegx-and-loop/23) page. – user42723 Apr 25 '19 at 02:23
10

@user42723 Cool! Looks like it's really, really fast. – templatetypedef Apr 25 '19 at 03:20
5

FYI: I had a hard time believing that sentence until I reversed the wording: "A string is a nontrivial rotation of itself if and only if it consists of the same pattern repeated multiple times". Go figure. – Axel Podehl Apr 25 '19 at 07:58
@templatetypedef This is the best solution. Because of its simplicity. Although Pranav C Balan solution is also a nice one. But I will prefer this one. – Maheer Ali Apr 25 '19 at 11:54
11

Do you have references to those theorems? – HRK44 Apr 25 '19 at 12:40
If this is for an algorithms test, I would look into a solution which avoids copying and concatenating the input string. You could avoid this allocation and concatenation by writing a cyclic `.indexOf` yourself. You would then run in constant space and linear time. – justinpc Apr 25 '19 at 13:45
string length is a property, not a method. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length – BurnsBA Apr 25 '19 at 18:12
4

I think the first statement is the same as "**Lemma 2.3**: If x and a rotation of x are equal, then x is a repetition" at https://doi.org/10.1016/j.tcs.2008.04.020 . See also: https://stackoverflow.com/a/2553533/1462295 – BurnsBA Apr 25 '19 at 18:32
1

I'll add that in this particular case str + str does not actually allocate new memory but rather (stings are trees in V8) just create a bigger tree :D – Benjamin Gruenbaum Apr 26 '19 at 17:22
@BenjaminGruenbaum That's very cool! I'm teaching a course in data structures right now and didn't realize that people had used that tree representation in practice. Do you have a link or a reference I could look at to see how the V8 folks do this? – templatetypedef Apr 26 '19 at 17:50
@templatetypedef the two common string representations in JavaScript engines are ropes (SpiderMonkey) and Trees (like V8), ChakraCore (edge) actually does use string but flatten them first and does [boyer moore](https://github.com/Microsoft/ChakraCore/blob/95b7191c299eac2a7e8557937cb89f22c80bf20f/lib/Runtime/Library/JavascriptString.cpp#L1214-L1219) here (reference in linked file). Unfortunately V8 seems to also [Flatten](https://github.com/v8/v8/blob/master/src/objects/string.cc#L1152-L1153) the tree before searching and then just do a while loop (regular indexOf also does boyer-moore). – Benjamin Gruenbaum Apr 26 '19 at 18:01
So while it uses trees it does look like every strategy it uses ends up flattening them first. The documentation on what they do is [pretty interesting](https://github.com/v8/v8/blob/master/src/string-search.h#L23-L41), I guess (surprisingly) searching in strings is a lot more common than allocating them so they optimize for that. I'll try to dig which engine does the "search in trees". Maybe it was JavaScript core. – Benjamin Gruenbaum Apr 26 '19 at 18:03
Nope, JavaScriptCore does ropes (reference: https://github.com/WebKit/webkit/blob/master/Source/JavaScriptCore/runtime/JSString.cpp ) - so I guess V8 and ChakraCore are the only one doing trees and they flatten before doing indexOf - so I was full of shit before. I could have sworn at some point (maybe in the CrankShaft days) I recall seeing code that did indexOf with a pointer on the trees in a clever way but maybe it was one of those comits that never made it past the chromium commit queue. – Benjamin Gruenbaum Apr 26 '19 at 18:07
This solution is very succinct, but not efficient. – Ben Voigt Apr 27 '19 at 05:08
1

@BenVoigt I was very surprised to see the performance numbers on this. I’d have figured that since it’s doing a string concatenation followed by a string search (I assumed folks weren’t using Boyer-Moore or KMP) that this would be pretty slow. But it appears to work well in practice. Gotta hand it to the folks who make JS run so fast - they take their work seriously! If I were doing this in C or C++, though, I’d definitely opt for a more aggressively optimized implementation that harnessed the fact that the string we’re searching has nice structure. – templatetypedef Apr 27 '19 at 07:06
The jsPerf results on Chrome have to be wrong. The rotation algorithm is supposedly executing at 33,000,000 ops/sec; that's 100x faster than the other algorithms. Meanwhile, Firefox and Safari have it at 16,000 ops/sec and 48,000 ops/sec – second-fastest, but still 80% slower than the prime number algorithm. The numbers on Chrome are obviously inflated. I suspect some kind of caching or optimization is messing things up in the benchmark. – MultiplyByZer0 Apr 27 '19 at 07:12
The incorrect jsPerf benchmark I was talking about is [this one](https://jsperf.com/reegx-and-loop/23). On the other hand, [this other one](https://jsperf.com/stackoverflow-question-55823298) seems to be producing reasonable results, probably because the test strings are randomly generated and the result of each function is checked against a reference implementation. – MultiplyByZer0 Apr 27 '19 at 07:54
Vyacheslav Egorov, a V8 compiler engineer, has written extensively about how microbenchmarks can be misleading ([1](https://mrale.ph/blog/2012/12/15/microbenchmarks-fairy-tale.html), [2](https://mrale.ph/blog/2014/02/23/the-black-cat-of-microbenchmarks.html)) because JITs can optimize out significant chunks of benchmarked code unless steps are taken to prevent this. I think that is what's happening here. I have no proof of this, but I don't think the numbers can be that high naturally. – MultiplyByZer0 Apr 27 '19 at 08:01
@templatetypedef: The problem's that all the inputs are hard-coded, so the compiler can build string search tables at compile time where that part of the work doesn't get counted. A good benchmark needs to cause the tests to treat their input as variable. – Ben Voigt Apr 27 '19 at 14:16
@MultiplyByZer0 you can just run V8 from Node from the command line and trace the optimizations or build V8 without dead-code elimination (what you're seeing with the 33M numbers) and get the actual numbers. It's also worth mentioning that the fact chrome is cheating by caching and dead-code-elimination is a "good thing" and like you said, microbenchmarks are terrible :D – Benjamin Gruenbaum Apr 28 '19 at 09:26

Pranav C Balan · Answer 2 · 2019-04-25T09:48:55.210

68

You can do it by a capturing group and backreference. Just check it's the repetition of the first captured value.

function check(str) {
  return /^(.+)\1+$/.test(str)
}

console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false

In the above RegExp:

^ and $ stands for start and end anchors to predict the position.
(.+) captures any pattern and captures the value(except \n).
\1 is backreference of first captured value and \1+ would check for repetition of captured value.

Regex explanation here

For RegExp debugging use: https://regex101.com/r/pqlAuP/1/debugger

Performance : https://jsperf.com/reegx-and-loop/13

edited Apr 25 '19 at 09:48

answered Apr 24 '19 at 06:08

Pranav C Balan

113,687
23
165
188

2

Can you explain to us what this line is doing return /^(.+)\1+$/.test(str) – Thanveer Shah Apr 24 '19 at 06:09
34

Also what is the complexity of this solution? I'm not absolutely sure but it doesn't seem to be much faster than the one the OP has. – Leron Apr 24 '19 at 06:13
8

@PranavCBalan I'm not good at algorithms, that's why I write in the comments section. However I have several things to mention - the OP already has a working solution so he is asking for one that will give him better performance and you haven't explained how your solution will outperform his. Shorter doesn't mean faster. Also, from the link you gave: `If you use normal (TCS:no backreference, concatenation,alternation,Kleene star) regexp and regexp is already compiled then it's O(n).` but as you wrote you are using backreference so is it still O(n)? – Leron Apr 24 '19 at 06:23
1

@MaheerAli : for me, it's showing existing code is `44% slower` – Pranav C Balan Apr 24 '19 at 06:29
1

@PranavCBalan Can't say whats the matter. I think I should add the whole problem that would make things clearer to you. – Maheer Ali Apr 24 '19 at 06:37
1

@PranavCBalan I need to check this for 1 billion numbers. I will obviously use other way for odd numbers. But Still linear time-complexity is not enough. – Maheer Ali Apr 24 '19 at 06:45
1

@MaheerAli : as another answer suggested try z-function algorithm : https://cp-algorithms.com/string/z-function.html#toc-tgt-2 – Pranav C Balan Apr 24 '19 at 06:53
5

You can use ````[\s\S]```` instead of ````.```` if you need to match newline characters in the same way as other characters. The dot character doesn't match on newlines; the alternative searches for all white-space and non-whitespace characters, which means that newlines are included in the match. (Note that this is faster than the more intuitive ````(.|[\r\n])````.) However, if the string definitely doesn't contain newlines, then the simple ````.```` will be fastest. Note this will be a lot simpler if [the dotall flag](https://github.com/tc39/proposal-regexp-dotall-flag) is implemented. – HappyDog Apr 24 '19 at 08:19
1

@PranavCBalan Your solution is still faster than the other answer which have z-function – Maheer Ali Apr 24 '19 at 12:43
1

@maheerali : nop... It gives better performance in different versn of chrome – Pranav C Balan Apr 24 '19 at 14:02
1

You can use `[^]` to match any character. – 12Me21 Apr 24 '19 at 16:01
1

For performance with bigger test data, check out https://jsperf.com/reegx-and-loop/13 ;-) – Axel Podehl Apr 24 '19 at 16:28
1

Any idea why this is ridiculously fast on Safari? On the same machine, Chrome does 2,250,117 ops/sec and Safari does 17,219,005 ops/sec. Maybe Safari is JITing the regex with LLVM? – Indiana Kernick Apr 25 '19 at 02:09
2

Isn't `/^(.+?)\1+$/` a little faster? (12 steps vs 20 steps) – online Thomas Apr 25 '19 at 08:39
1

So what's the worst case performance for the backtracking regex here? On first glance this seems to fall in the usual case that makes DDOSing services that use this without timeouts pretty simple. The z-function approach seems immune to this. Using 6 char long strings for performance tests without trying the usual worst case scenarios hides the problem. – Voo Apr 25 '19 at 09:32
1

@PedroLobito it should not pass it. `aabb` is not a repeating of the same substring. – online Thomas Apr 26 '19 at 10:48

MBo · Answer 3 · 2019-04-25T02:28:52.670

Perhaps the fastest algorithmic approach is building a Z-function in linear time:

The Z-function for this string is an array of length n where the i-th element is equal to the greatest number of characters starting from the position i that coincide with the first characters of s.

In other words, z[i] is the length of the longest common prefix between s and the suffix of s starting at i.

C++ implementation for reference:

vector<int> z_function(string s) {
    int n = (int) s.length();
    vector<int> z(n);
    for (int i = 1, l = 0, r = 0; i < n; ++i) {
        if (i <= r)
            z[i] = min (r - i + 1, z[i - l]);
        while (i + z[i] < n && s[z[i]] == s[i + z[i]])
            ++z[i];
        if (i + z[i] - 1 > r)
            l = i, r = i + z[i] - 1;
    }
    return z;
}

JavaScript implementation
Added optimizations - building a half of z-array and early exit

function z_function(s) {
  var n = s.length;
  var z = Array(n).fill(0);
  var i, l, r;
  //for our task we need only a half of z-array
  for (i = 1, l = 0, r = 0; i <= n/2; ++i) {
    if (i <= r)
      z[i] = Math.min(r - i + 1, z[i - l]);
    while (i + z[i] < n && s[z[i]] == s[i + z[i]])
      ++z[i];

      //we can check condition and return here
     if (z[i] + i === n && n % i === 0) return true;
    
    if (i + z[i] - 1 > r)
      l = i, r = i + z[i] - 1;
  }
  return false; 
  //return z.some((zi, i) => (i + zi) === n && n % i === 0);
}
console.log(z_function("abacabacabac"));
console.log(z_function("abcab"));

Then you need to check indexes i that divide n. If you find such i that i+z[i]=n then the string s can be compressed to the length i and you can return true.

For example, for

string s= 'abacabacabac'  with length n=12`

z-array is

(0, 0, 1, 0, 8, 0, 1, 0, 4, 0, 1, 0)

and we can find that for

i=4
i+z[i] = 4 + 8 = 12 = n
and
n % i = 12 % 4 = 0`

so s might be represented as substring of length 4 repeated three times.

Thanks for adding JavaScript stuff to Salman A and Pranav C Balan — MBo, Apr 24 '19 at 18:05
Alternate approach by avoiding an additional iteration `const check = (s) => { let n = s.length; let z = Array(n).fill(0); for (let i = 1, l = 0, r = 0; i < n; ++i) { if (i <= r) z[i] = Math.min(r - i + 1, z[i - l]); while (i + z[i] < n && s[z[i]] == s[i + z[i]]) ++z[i]; // check condition here and return if (z[i] + i === n && n % i === 0) return true; if (i + z[i] - 1 > r) l = i, r = i + z[i] - 1; } // or return false return false; }` — Pranav C Balan, Apr 24 '19 at 18:14
Using the z-function is a good idea, but it is 'information -heavy', it contains a lot of information that is never used. — Axel Podehl, Apr 25 '19 at 07:37
@Axel Podehl Nevertheless, it treats string in O(n) time (each char is used at most two times). In any case we must check every char, so there is no theoretically faster algorithm (while optimized in-built methods might outperform). Also in the last edit I limited calculation by 1/2 of string length. — MBo, Apr 25 '19 at 07:43
yes, these builtin functions in Javascript are a lot faster than 'handwriten' code. I had to make the testcases really large for my recursive algorithm to outperform the regex implementation. — Axel Podehl, Apr 25 '19 at 07:52

user42723 · Answer 4 · 2019-04-27T00:40:13.910

23

I read the answer of gnasher729 and implemented it. The idea is that if there are any repetitions, then there must be (also) a prime number of repetitions.

function* primeFactors (n) {
    for (var k = 2; k*k <= n; k++) {
        if (n % k == 0) {
            yield k
            do {n /= k} while (n % k == 0)
        }
    }
    if (n > 1) yield n
}

function check (str) {
    var n = str.length
    primeloop:
    for (var p of primeFactors(n)) {
        var l = n/p
        var s = str.substring(0, l)
        for (var j=1; j<p; j++) {
            if (s != str.substring(l*j, l*(j+1))) continue primeloop
        }
        return true
    }
    return false
}

A slightly different algorithm is this:

function check (str) {
    var n = str.length
    for (var p of primeFactors(n)) {
        var l = n/p
        if (str.substring(0, n-l) == str.substring(l)) return true
    }
    return false
}

I've updated the jsPerf page that contains the algorithms used on this page.

edited Apr 27 '19 at 00:40

answered Apr 25 '19 at 01:07

user42723

467
3
8

This seems really fast since it skips unnecessary checks. – Pranav C Balan Apr 25 '19 at 02:51
1

Very nice, only I think I would check that the first letter reoccurs at the specified location before making the substring calls. – Ben Voigt Apr 27 '19 at 05:05
For people stumbling on `function*` for the first time like me, it's for declaring a generator, not a regular function. See [MDN](https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Instructions/function*) – Julien Rousé Apr 29 '19 at 15:44

score 18 · Answer 5 · answered Apr 24 '19 at 14:40

Assume the string S has length N and is made of duplicates of the substring s, then the length of s divides N. For example, if S has length 15, then the substring has length 1, 3, or 5.

Let S be made of (p*q) copies of s. Then S is also made of p copies of (s, repeated q times). We have therefore two cases: If N is prime or 1, then S can only be made of copies of the substring of length 1. If N is composite, then we only need to check substrings s of length N / p for primes p dividing the length of S.

So determine N = the length of S, then find all its prime factors in time O (sqrt (N)). If there is only one factor N, check if S is the same string repeated N times, otherwise for each prime factor p, check if S consists of p repeations of the first N / p characters.

I haven't checked the other solutions, but this seems very fast. You can leave out the "If there is only one factor N, check ..., otherwise" part for simplicity, as this is not a special case. Would be nice to see a Javascript implementation that can be run in jsPerf next to the other implementations. — user42723, Apr 24 '19 at 16:59
I've now implemented this in [my answer](https://stackoverflow.com/a/55840471/1774707) — user42723, Apr 25 '19 at 01:22

Axel Podehl · Answer 6 · 2019-04-24T16:17:57.513

10

I think a recursive function might be very fast as well. The first observation is that the maximum repeated pattern length is half as long as the total string. And we could just test all possible repeated pattern lengths: 1, 2, 3, ..., str.length/2

The recursive function isRepeating(p,str) tests if this pattern is repeated in str.

If str is longer than the pattern, the recursion requires the first part (same length as p) to be a repetition as well as the remainder of str. So str is effectively broken up into pieces of length p.length.

If the tested pattern and str are of equal size, recursion ends here, successfully.

If the length is different (happens for "aba" and pattern "ab") or if the pieces are different, then false is returned, propagating up the recursion.

function check(str)
{
  if( str.length==1 ) return true; // trivial case
  for( var i=1;i<=str.length/2;i++ ) { // biggest possible repeated pattern has length/2 characters

    if( str.length%i!=0 ) continue; // pattern of size i doesn't fit
    
    var p = str.substring(0, i);
    if( isRepeating(p,str) ) return true;
  }
  return false;
}


function isRepeating(p, str)
{
  if( str.length>p.length ) { // maybe more than 2 occurences

    var left = str.substring(0,p.length);
    var right = str.substring(p.length, str.length);
    return left===p && isRepeating(p,right);
  }
  return str===p; 
}

console.log(check('aa')) //true
console.log(check('aaa')) //true 
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false

Performance: https://jsperf.com/reegx-and-loop/13

edited Apr 24 '19 at 16:17

answered Apr 24 '19 at 11:36

Axel Podehl

4,034
29
41

1

Would it be faster to check `if( str===p.repeat(str.length/i) ) return true;` instead of using a recursive function? – Chronocidal Apr 24 '19 at 14:24
1

Don't put console.logs in jsperf tests, prepare the functions inside the globals section, also prepare the test strings in the globals section (sorry, cannot edit the jsperf) – Salman A Apr 24 '19 at 15:17
@Salman - good point. I just modified the jsperf from my predecessor (Pranav C), first time I used jsperf, cool tool. – Axel Podehl Apr 24 '19 at 15:23
@SalmanA : updated : https://jsperf.com/regex-and-loop/1 ... thanks for the info... even I'm not familiar with it(Jsperf) ... thanks for the information – Pranav C Balan Apr 24 '19 at 15:47
Hi Salman, thanks a lot for https://jsperf.com/reegx-and-loop/10 - yes, that new perf test makes much more sense. The setup of functions should go into the preparation code. – Axel Podehl Apr 24 '19 at 16:03
@Chronocidal - very good idea, but I played with it a bit and it seems to be expensive to create the complete string in advance. Performance-wise regex is beating all our other algorithms, probably because it's just ONE Javascript function, implemented natively by the Browser. Also it'll probably depend on the set of strings we test with... – Axel Podehl Apr 24 '19 at 16:05
Thanks Salman for the jsperf tips - now with some REAL BIG test strings performance does look different ! And the recursive algorithm is even faster than regex ;-) Yay ! https://jsperf.com/reegx-and-loop/13 – Axel Podehl Apr 24 '19 at 16:22
@AxelPodehl : yes it's much better than regex with larger string ;) – Pranav C Balan Apr 24 '19 at 16:56
@axelpodehl sure :) – Pranav C Balan Apr 25 '19 at 12:25

score 7 · Answer 7 · answered Apr 25 '19 at 15:25

Wrote this in Python. I know it is not the platform, but it did take 30 mins of time. P.S.=> PYTHON

def checkString(string):
    gap = 1 
    index= 0
    while index < len(string)/2:
        value  = [string[i:i+gap] for i in range(0,len(string),gap) ]

        x = [string[:gap]==eachVal for eachVal in value]

        if all(x):
            print("THEY ARE  EQUAL")
            break 

        gap = gap+1
        index= index+1 

checkString("aaeaaeaaeaae")

SunKnight0 · Answer 8 · 2019-04-24T19:02:56.480

My approach is similar to gnasher729, in that it uses the potential length of the substring as the main focus, but it is less math-y and process intensive:

L: Length of original string

S: Potential lengths of valid sub-strings

Loop S from (integer part of) L/2 to 1. If L/S is an integer check your original string against the fist S characters of the original string repeated L/S times.

The reason for looping from L/2 backwards and not from 1 onwards is to get the largest possible substring. If you want the smallest possible substring loop from 1 to L/2. Example: "abababab" has both "ab" and "abab" as possible substrings. Which of the two would be faster if you only care about a true/false result depends on the type of strings/substrings this will be applied to.

score 5 · Answer 9 · answered Apr 27 '19 at 19:41

The following Mathematica code almost detects if the list is repeated at least once. If the string is repeated at least once, it returns true, but it might also return true if the string is a linear combination of repeated strings.

IsRepeatedQ[list_] := Module[{n = Length@list},
   Round@N@Sum[list[[i]] Exp[2 Pi I i/n], {i, n}] == 0
];

This code looks for the "full-length" contribution, which must be zero in a repeating string, but the string accbbd is also considered repeated, as it is a sum of the two repeated strings ababab and 012012.

The idea is to use Fast Fourier Transform, and look for the frequency spectra. By looking at other frequencies, one should be able to detect this strange scenario as well.

score 4 · Answer 10 · answered Apr 30 '19 at 19:40

The basic idea here is to examine any potential substring, beginning at length 1 and stopping at half of the original string's length. We only look at substring lengths that divide the original string length evenly (i.e. str.length % substring.length == 0).

This implementation looks at the first character of each possible substring iteration before moving to the second character, which might save time if the substrings are expected to be long. If no mismatch is found after examining the entire substring, then we return true.

We return false when we run out of potential substrings to check.

function check(str) {
  const len = str.length;
  for (let subl = 1; subl <= len/2; ++subl) {
    if ((len % subl != 0) || str[0] != str[subl])
      continue;
    
    let i = 1;
    for (; i < subl; ++i)
    {
      let j = 0;
      for (; j < len; j += subl)
        if (str[i] != str[j + i])
          break;
      if (j != len)
        break;
    }
    
    if (i == subl)
      return true;
  }
  return false;
}

console.log(check('aa')) //true
console.log(check('aaa')) //true
console.log(check('abcabcabc')) //true
console.log(check('aba')) //false
console.log(check('ababa')) //false

GoonGamja · Answer 11 · 2021-01-15T02:03:23.910

It's been more than a year since this question was posted but I used length of string and object form to validate whether it is true or false.

const check = (str) => {
  let count = 0;
  let obj = {};
  if (str.length < 2) return false;
  
  for(let i = 0; i < str.length; i++) {
    if (!obj[str[i]]) {
       count+=1;
      obj[str[i]] = 0;
    };
    obj[str[i]] = obj[str[i]] + 1;
  };
  
  if (Object.values(obj).every(item => item === 1)) {
    return false
  };
  
  if ([...str].length%count === 0) {
    return true
  } else {
    return false
  };
};

console.log(check("abcabcabcac")) // false
console.log(check("aaa")) // true
console.log(check("acaca")) // false
console.log(check("aa")) // true
console.log(check("abc")) // false
console.log(check("aabc")) // false

score -1 · Answer 12 · edited Apr 24 '19 at 10:56

-1

I'm not familiar with JavaScript, so I don't know how fast this is going to be, but here is a linear time solution (assuming reasonable builtin implementation) using only builtins. I'll describe the algorithm in pseudocode.

function check(str) {
    t = str + str;
    find all overlapping occurrences of str in t;
    for each occurrence at position i
        if (i > 0 && i < str.length && str.length % i == 0)
            return true;  // str is a repetition of its first i characters
    return false;
}

The idea is similar to MBo's answer. For each i that divides the length, str is a repetition of its first i characters if and only if it remains the same after shifting for i characters.

It comes to my mind that such a builtin may be unavailable or inefficient. In this case, it is always possible to implement the KMP algorithm manually, which takes about the same amount of code as the algorithm in MBo's answer.

edited Apr 24 '19 at 10:56

Peter Mortensen

30,738
21
105
131

answered Apr 24 '19 at 08:10

infmagic2047

209
1
8

The OP wants to know whether repetition *exists*. The second line of (the body of) your function *counts* the number of repetitions - that's the bit that needs to be explained. E.g. "abcabcabc" has 3 repetitions of "abc", but how did your second line work out *whether* it had any repetitions? – Lawrence Apr 24 '19 at 12:05
@Lawrence I don't understand your question. This algorithm is based on the idea that the string is a repetition of its substring if and only if for some divisor of its length `i`, `s[0:n-i] == s[i:n]`, or equivalently, `s == s[i:n] + s[0:i]`. Why does the second line need to work out whether it had any repetitions? – infmagic2047 Apr 25 '19 at 00:20
Let me see if I understand your algorithm. First, you append `str` to itself to form `t`, then scan `t` to try to find `str` inside `t`. Okay, this can work (I've retracted my downvote). It's not linear in strlen(str), though. Say `str` is of length L. Then at each position p=0,1,2,..., checking whether str[0..L-1] == t[p..p+L-1] takes O(L) time. You need to do O(L) checks as you go through the values of p, so it's O(L^2). – Lawrence Apr 25 '19 at 04:52

score -10 · Answer 13 · edited Apr 24 '19 at 10:54

-10

One of the simple ideas is to replace the string with the substring of "" and if any text exist then it is false, else it is true.

'ababababa'.replace(/ab/gi,'')
"a" // return false
'abababab'.replace(/ab/gi,'')
 ""// return true

edited Apr 24 '19 at 10:54

Peter Mortensen

30,738
21
105
131

answered Apr 24 '19 at 06:10

Vinod kumar G

639
6
17

yes, for abc or unicorn wouldn't user will check with /abc/ or /unicorn/ , sorry if i am missing your context – Vinod kumar G Apr 24 '19 at 08:10
3

The question could be clearer, but what it's asking for is a way of deciding whether the string is completely made up of 2 or more repetitions of any other string. It is not searching for a specific substring. – HappyDog Apr 24 '19 at 08:26
2

I've added some clarification to the question, which should make it clearer now. – HappyDog Apr 24 '19 at 09:55
@Vinod if you are already going to use regex you should anchor your match and use test. No reason to modify the string just to validate some condition. – Marie Apr 24 '19 at 11:55

How do I check if a string is entirely made of the same substring?

13 Answers13

Linked