6

I'm trying to run this regex but it stuck my console. Why?

var str = "Шедевры православной музыки - 20 золотых православных песен";
str.match(/^(([\u00C0-\u1FFF\u2C00-\uD7FF]+[^a-z\u00C0-\u1FFF\u2C00-\uD7FF]*)+) [a-z]+[^\u00C0-\u1FFF\u2C00-\uD7FF]*$/i);
Moshe Simantov
  • 3,937
  • 2
  • 25
  • 35
  • 3
    What do you mean by stuck? You enter this in he console and it freezes? – XCS Feb 17 '16 at 10:59
  • 8
    It just causes [catastrophic backtracking](https://regex101.com/r/eT6gL3/1) due to `(([\u00C0-\u1FFF\u2C00-\uD7FF]+[^a-z\u00C0-\u1FFF\u2C00-\uD7FF]*)+)` part. More details on [catastrophic backtracking can be found here](http://www.regular-expressions.info/catastrophic.html). What are the actual requirements for the regex? – Wiktor Stribiżew Feb 17 '16 at 11:01
  • 1
    Are you looking for [`^([\u00C0-\u1FFF\u2C00-\uD7FF]+(?:[^a-z\u00C0-\u1FFF\u2C00-\uD7FF]+[\u00C0-\u1FFF\u2C00-\uD7FF]+)*) [a-z]+[^\u00C0-\u1FFF\u2C00-\uD7FF]*$`](https://regex101.com/r/eT6gL3/2)? – Wiktor Stribiżew Feb 17 '16 at 11:06

1 Answers1

8

Your regex causes catastrophic backtracking (see a demo of your regex here) due to (([\u00C0-\u1FFF\u2C00-\uD7FF]+[^a-z\u00C0-\u1FFF\u2C00-\uD7FF]*)+) part. As [^a-z\u00C0-\u1FFF\u2C00-\uD7FF]* can match zero characters, you basically have a classical (a+)+-like pattern (cf: ([\u00C0-\u1FFF\u2C00-\uD7FF]+)+) that causes backtracking issue.

To get rid of it, you need to make sure the subpatterns are compulsory inside the grouping, and apply a * quantifier to the whole grouping:

^([\u00C0-\u1FFF\u2C00-\uD7FF]+(?:[^a-z\u00C0-\u1FFF\u2C00-\uD7FF]+[\u00C0-\u1‌​FFF\u2C00-\uD7FF]+)*) [a-z]+[^\u00C0-\u1FFF\u2C00-\uD7FF]*$

See regex demo

Here, [\u00C0-\u1FFF\u2C00-\uD7FF]+(?:[^a-z\u00C0-\u1FFF\u2C00-\uD7FF]+[\u00C0-\u1‌​FFF\u2C00-\uD7FF]+)* matches:

  • [\u00C0-\u1FFF\u2C00-\uD7FF]+ - one or more character from [\u00C0-\u1FFF\u2C00-\uD7FF] ranges
  • (?:[^a-z\u00C0-\u1FFF\u2C00-\uD7FF]+[\u00C0-\u1‌​FFF\u2C00-\uD7FF]+)* - zero or more sequences of:
    • [^a-z\u00C0-\u1FFF\u2C00-\uD7FF]+ - one or more characters other than those from the a-z\u00C0-\u1FFF\u2C00-\uD7FF ranges
    • [\u00C0-\u1‌​FFF\u2C00-\uD7FF]+ - one or more characters from the \u00C0-\u1‌​FFF\u2C00-\uD7FF ranges.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563