Questions tagged [regex-recursion]

Some modern regex flavors support recursion in Regex: Perl 5.10, PCRE 4.0, Ruby 2.0, and all later versions of these three, support regular expression recursion. Ruby 1.9 supports capturing group recursion (the whole regex can be recursed if wrapped in a capturing group.) .NET does not support recursion, but it supports balancing groups that can be used instead of recursion to match balanced constructs.

From regular-expressions.info:
Perl 5.10, PCRE 4.0, Ruby 2.0, and all later versions of these three, support regular expression recursion. Perl uses the syntax (?R) with (?0) as a synonym. Ruby 2.0 uses \g<0>. PCRE supports all three as of version 7.7. Earlier versions supported only the Perl syntax (which Perl actually copied from PCRE). Recent versions of Delphi, PHP, and R also support all three, as their regex functions are based on PCRE. JGsoft V2 also supports all variations of regex recursion.

While Ruby 1.9 does not have any syntax for regex recursion, it does support capturing group recursion. So you could recurse the whole regex in Ruby 1.9 if you wrap the whole regex in a capturing group. .NET does not support recursion, but it supports balancing groups that can be used instead of recursion to match balanced constructs.

As we'll see later, there are differences in how Perl, PCRE, and Ruby deal with backreferences and backtracking during recursion. While they copied each other's syntax, they did not copy each other's behavior. JGsoft V2, however, copied their syntax and their behavior. So JGsoft V2 has three different ways of doing regex recursion, which you choose by using a different syntax. But these differences do not come into play in the basic example on this page.

Boost 1.42 copied the syntax from Perl but its implementation is marred by bugs, which are still not all fixed in version 1.62. Most significantly, quantifiers other than * or {0,} cause recursion to misbehave. This is partially fixed in Boost 1.60 which correctly handles ? and {0,1} too.

The regexes a(?R)?z, a(?0)?z, and a\g<0>?z all match one or more letters a followed by exactly the same number of letters z. Since these regexes are functionally identical, we'll use the syntax with R for recursion to see how this regex matches the string aaazzz.

First, a matches the first a in the string. Then the regex engine reaches (?R). This tells the engine to attempt the whole regex again at the present position in the string. Now, a matches the second a in the string. The engine reaches (?R) again. On the second recursion, a matches the third a. On the third recursion, a fails to match the first z in the string. This causes (?R) to fail. But the regex uses a quantifier to make (?R) optional. So the engine continues with z which matches the first z in the string.

Now, the regex engine has reached the end of the regex. But since it's two levels deep in recursion, it hasn't found an overall match yet. It only has found a match for (?R). Exiting the recursion after a successful match, the engine also reaches z. It now matches the second z in the string. The engine is still one level deep in recursion, from which it exists with a successful match. Finally, z matches the third z in the string. The engine is again at the end of the regex. This time, it's not inside any recursion. Thus, it returns aaazzz as the overall regex match.

22 questions
0
votes
2 answers

Regex for getting multiple words after a delimiter

I have been trying to get the separate groups from the below string using regex in PCRE: drop = blah blah blah something keep = bar foo nlah aaaa rename = (a=b d=e) obs=4 where = (foo > 45 and bar == 35) Groups I am trying to make is like: 1. drop =…
Frosty
  • 560
  • 2
  • 12
0
votes
2 answers

Match multiple strings in multi lines and also replace multiple strings

I want to replace all the strings with DISPLAY="TRUE" to DISPLAY="FALSE" in the first line and vice versa in the next line in a single match. Example: FROM: Appels
Jerald Sabu M
  • 1,200
  • 3
  • 16
  • 19
0
votes
1 answer

MySQL query to get count of repeating characters from a string

My target data/table: mysql> select firstname from empl; +-----------+ | firstname | +-----------+ | Abhishek | | Arnab | | Aamaaan | | Arbaaz | | Mohon | | Parikshit | | Tom | | Koustuv | | Amit | | Bibhishana| |…
mysqlrockstar
  • 2,536
  • 1
  • 19
  • 36
0
votes
1 answer

Extracting number of specific length from a string in Postgres

I am trying to extract a set of numbers from comments like "on april-17 transactions numbers are 12345 / 56789" "on april-18 transactions numbers are 56789" "on may-19 no transactions" Which are stored in a column called "com" in table comments …
Maverick
  • 397
  • 1
  • 4
  • 18
0
votes
1 answer

Get characters which are not in brackets REGEX

I'm basically working on custom query building. I've designed the pattern as field_set and sub_field_sets. A sample Query: ({e:3}.{f:44}.{f:2}) + ( ({e:3}.{f:44}.{f:3}) + ({e:3}.{f:44}.{f:4}) ) -…
Sarmad
  • 315
  • 1
  • 4
  • 15
0
votes
0 answers

How to extract a list of all 18 character entries after specific phrase in string using RegExr?

I managed to extract a list of the text within square brackets within an emoji list I have here: https://regexr.com/3sqk1 But now I need to extract the equivalent decimalSurrogateHtml pairs for each emoji (I know a few of them have 2 pairs but would…
deeve
  • 113
  • 6
-2
votes
2 answers

Regex - Recursion - nested matches with multiple ending

this is my first question on stackoverflow so please bare with me here. Also I am not a native english speaker. (16.02.2022) ANSWER (https://regex101.com/r/4FRznK/1 from Comment on Answer). Special thanks to Casimir et Hippolyte for your help! I…
1
2