4

When I make a regex variable with capturing groups, the whole match is OK, but capturing groups are Nil.

my $str = 'nn12abc34efg';
my $atom = / \d ** 2 /;
my $rgx = / ($atom) \w+ ($atom) /;

$str ~~ / $rgx / ;
say ~$/;  # 12abc34
say $0;   # Nil
say $1;   # Nil

If I modify the program to avoid $rgx, everything works as expected:

my $str = 'nn12abc34efg';

my $atom = / \d ** 2 /;
my $rgx = / ($atom) \w+ ($atom) /;

$str ~~ / ($atom) \w+ ($atom) /;
say ~$/;  # 12abc34
say $0;   # 「12」
say $1;   # 「34」
Eugene Barsky
  • 5,780
  • 3
  • 17
  • 40
  • 2
    Interesting question. I am not sure why this happens, but you could make `$rgx` a named regex using e.g. `my regex rgx { ($atom) \w+ ($atom) }`. Then after `$str ~~ / /` we would have that `$[0]` represents the first capture group (for example). – Håkon Hægland Oct 13 '17 at 09:16
  • 1
    See also [How can I interpolate a variable into a Perl 6 regex?](https://stackoverflow.com/q/40883160/2173773) – Håkon Hægland Oct 13 '17 at 09:24
  • Thanks!! Didn't know about named regexes. – Eugene Barsky Oct 13 '17 at 09:31

2 Answers2

5

With your code, the compiler gives the following warning:

Regex object coerced to string (please use .gist or .perl to do that)

That tells us something is wrong—regex shouldn't be treated as strings. There are two more proper ways to nest regexes. First, you can include sub-regexes within assertions(<>):

my $str = 'nn12abc34efg';
my Regex $atom = / \d ** 2 /;
my Regex $rgx = / (<$atom>) \w+ (<$atom>) /;
$str ~~ $rgx;

Note that I'm not matching / $rgx /. That is putting one regex inside another. Just match $rgx.

The nicer way is to use named regexes. Defining atom and the regex as follows will let you access the match groups as $<atom>[0] and $<atom>[1]:

my regex atom { \d ** 2 };
my $rgx = / <atom> \w+ <atom> /;
$str ~~ $rgx;
piojo
  • 6,351
  • 1
  • 26
  • 36
  • 1
    Thank you for the perfect answer! My understanding of p6 regex syntax, especially the use of `<>` is rather vague. – Eugene Barsky Oct 14 '17 at 13:20
  • 1
    @evb Glad I helped. I actually don't know why the original code didn't work. I speculate it's because of how the three regular expressions are composed, and I wonder whether the match group is being set then unset as a nested regex is matched. Perhaps it is a bug in rakudo, since nesting doesn't unset matches in the other two variations. But the fact that the compiler warned us lets it off the hook in my book. – piojo Oct 14 '17 at 16:29
  • I've tried your 2nd solution with `(<$atom>)` and it still doesn't work — both `$0` and `$1` are `Nil`. – Eugene Barsky Oct 15 '17 at 08:39
  • The 3rd solution doesn't work with me either. It wouldn't compile with `Regex`. If I change it to `regex`, it compiles and gives correct `$/`, but `$` is `Nil`. – Eugene Barsky Oct 15 '17 at 08:46
  • 1
    @evb Ahh, I left out one important detail—when you write `/ $rgx /`, you're putting the regex inside a regex. Don't do that. Match as: `$str ~~ $rgx`. – piojo Oct 15 '17 at 13:31
  • 1
    And I fixed the capitalization error, and removed the first paragraph (which was plain wrong). Very sorry about the mistakes. – piojo Oct 15 '17 at 13:46
4

The key observation is that $str ~~ / $rgx /; is a "regex inside of a regex". $rgx matched as it should and set $0 and $1 within it's own Match object, but then there was no where within the surrounding match object to store that information, so you couldn't see it. Maybe it's clear with an example, try this:

my $str = 'nn12abc34efg';
my $atom = / \d ** 2 /;
my $rgx = / ($atom) \w+ ($atom) /;

$str ~~ / $0=$rgx /;
say $/;

Note the contents of $0. Or as another example, let's give it a proper name:

my $str = 'nn12abc34efg';
my $atom = / \d ** 2 /;
my $rgx = / ($atom) \w+ ($atom) /;

$str ~~ / $<bits-n-pieces>=$rgx /;
say $/;
perlpilot
  • 101
  • 2