RegEx capturing group in Elixir

Question

I want to know how this Elixir regex work.

 Regex.run(~r{(*UTF)([^\w])+}, "dd!!%%%")

when I execute this regex, the output is

["!!%%%", "%"]

I'm not able to understand why the last % is repeated after matching the regex.

Just remove the parenthesis that surround `[^\w]` to avoid keeping matching groups. — Allan, May 13 '19 at 02:55

7stud · Answer 1 · 2019-05-13T21:22:10.443

I'm not able to understand why the last % is repeated after matching the regex.

I looks like you meant to write the pattern:

([^\w]+)

rather than something like:

([^\w])([^\w])...([^\w])

The first one gives the expected results:

1> Regex.run(~r{(*UTF)([^\w]+)}, "dd!!%%%")              
["!!%%%", "!!%%%"]

which is a list containing the whole match followed by what matched the capture groups. The second one produces:

iex(9)> Regex.run(~r{(*UTF)([^\w])([^\w])([^\w])}, "dd!!%%%")
["!!%", "!", "!", "%"]

which follows the same logic.

However, your pattern does not follow the logic of the second example with the repeated capture groups. According to regular-expressions.info:

[a] repeated capturing group will capture only the last iteration

So, at least this is known behavior.

It looks like because you explicitly specified only one capture group:

([^\w])

...only one capture group is created.

The capture group matches one character, and the value of the capture group is repeatedly overwritten with the new match as the regex traverses the string according to the + quantifier. When the end of the string is reached, the capture group contains only the last match.

score 1 · Answer 2 · edited Jun 20 '20 at 09:12

1

This tool helps you to see how your expression works:

([^\w])+

RegEx Circuit

You can visualize your expressions in this link:

Code

If you wish to only return !!%%% as your full match, without the group 1, this might work:

Regex.run(~r{(*UTF)[^\w]+}, "dd!!%%%")

edited Jun 20 '20 at 09:12

Community

1
1

answered May 13 '19 at 02:07

Emma

27,428
11
44
69

RegEx capturing group in Elixir

2 Answers2

RegEx Circuit

Code