I want to know how this Elixir regex work.
Regex.run(~r{(*UTF)([^\w])+}, "dd!!%%%")
when I execute this regex, the output is
["!!%%%", "%"]
I'm not able to understand why the last % is repeated after matching the regex.
I want to know how this Elixir regex work.
Regex.run(~r{(*UTF)([^\w])+}, "dd!!%%%")
when I execute this regex, the output is
["!!%%%", "%"]
I'm not able to understand why the last % is repeated after matching the regex.
I'm not able to understand why the last % is repeated after matching the regex.
I looks like you meant to write the pattern:
([^\w]+)
rather than something like:
([^\w])([^\w])...([^\w])
The first one gives the expected results:
1> Regex.run(~r{(*UTF)([^\w]+)}, "dd!!%%%")
["!!%%%", "!!%%%"]
which is a list containing the whole match followed by what matched the capture groups. The second one produces:
iex(9)> Regex.run(~r{(*UTF)([^\w])([^\w])([^\w])}, "dd!!%%%")
["!!%", "!", "!", "%"]
which follows the same logic.
However, your pattern does not follow the logic of the second example with the repeated capture groups. According to regular-expressions.info:
[a] repeated capturing group will capture only the last iteration
So, at least this is known behavior.
It looks like because you explicitly specified only one capture group:
([^\w])
...only one capture group is created.
The capture group matches one character, and the value of the capture group is repeatedly overwritten with the new match as the regex traverses the string according to the +
quantifier. When the end of the string is reached, the capture group contains only the last match.