11

I want to match against a programmatically-constructed regex, containing a number of (.*) capture groups. I have this regex as a string, say

my $rx = "(.*)a(.*)b(.*)"

I would like to interpolate that string as a regex and match for it. The docs tell me <$rx> should do the trick (i.e. interpolate that string as a regex), but it doesn't. Compare the output of a match (in the perl6 REPL):

> 'xaybz' ~~ rx/<$rx>/
「xaybz」

vs the expected/desired output, setting apart the capture groups:

> 'xaybz' ~~ rx/(.*)a(.*)b(.*)/
「xaybz」
 0 => 「x」
 1 => 「y」
 2 => 「z」

Comments

One unappealing way I can do this is to EVAL my regex match (also in the REPL):

> use MONKEY; EVAL "'xaybz' ~~ rx/$rx/";
「xaybz」
 0 => 「x」
 1 => 「y」
 2 => 「z」

So while this does give me a solution, I'm sure there's a string-interpolation trick I'm missing that would obviate the need to rely on EVAL..

grobber
  • 1,083
  • 1
  • 9
  • 20

3 Answers3

10

The result of doing the match is being matched when going outside the regex. This will work:

my $rx = '(.*)a(.*)b(.*)';
'xaybz' ~~ rx/$<result>=<$rx>/;
say $<result>;
# OUTPUT: «「xaybz」␤ 0 => 「x」␤ 1 => 「y」␤ 2 => 「z」␤»

Since, by assigning to a Match variable, you're accessing the raw Match, which you can then print. The problem is that <$rx> is, actually, a Match, not a String. So what you're doing is a Regex that matches a Match. Possibly the Match is stringified, and then matched. Which is the closest I can be to explaining the result

jjmerelo
  • 22,578
  • 8
  • 40
  • 86
  • 1
    Thank you very much, but I don't think I really understand what happened there. Could you please point me to where this is documented? – grobber Oct 19 '20 at 07:33
  • What confuses me is that "stringification" does not seem to be what's happening. Note the quotation marks `「 」` around the result I get. If I stringify the match with `$/.Str` those are gone. So my getting `「xaybz」` cannot be due to "stringification", as that's still a Match object. The problem seems to be that the capturing parentheses are ignored in my initial attempt, and I don't see why that is. – grobber Oct 19 '20 at 08:40
  • @grobber that's right. It's simply being converted into a different Match object. I'll edit a bit to try and explain what's going on. Essentially, `<$rx>` is a Match, and you're matching to a Match, which is stringified... – jjmerelo Oct 19 '20 at 11:31
9

The problem is that things in <…> don't capture in general.

'xaybz' ~~ / <:Ll> <:Ll> <:Ll> /
# 「xay」

They do capture if the first thing after < is an alphabetic.

my regex foo { (.*)a(.*)b(.*) }

'xaybz' ~~ / <foo> /;
# 「xaybza」
#  foo => 「xaybza」
#   0 => 「x」
#   1 => 「y」
#   2 => 「za」

That also applies if you use <a=…>

'xaybz' ~~ / <rx=$rx> /;
# 「xaybza」
#  rx => 「xaybza」
#   0 => 「x」
#   1 => 「y」
#   2 => 「za」

Of course you can assign it on the outside as well.

'xaybz' ~~ / $<rx> = <$rx> /;
# 「xaybza」
#  rx => 「xaybza」
#   0 => 「x」
#   1 => 「y」
#   2 => 「za」

'xaybz' ~~ / $0 = <$rx> /;
# 「xaybza」
#  0 => 「xaybza」
#   0 => 「x」
#   1 => 「y」
#   2 => 「za」

Note that <…> is a sub-match, so the $0,$1,$2 from the $rx will never be on the top-level.

Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129
  • Thanks for that explanation. The bit about non-alphabetic `<>` in particular was illuminating. Note that the issue propagates: if I first do `my $rx='(\d)'; my regex R { <$rx> }; '12' ~~ //` it still doesn't capture. – grobber Oct 19 '20 at 15:14
  • 3
    grobber: that's because you defined `R` as `<$rx>`. If you do `regex R { }` then it will work. One of the reasons for requiring the naming is that positional captures are counted at a compile time. If you do `regex R { <$rx> (.) }` it will be impossible to know what the number for `(.)` should be. Named captures don't suffer from the same limitation. – user0721090601 Oct 19 '20 at 15:51
  • 2
    "Note that the issue propagates". I'm struck by your choice of the word *issue*. The fact that only `<...>` assertions that start with an alphabetic character capture is a deliberate feature. In your example... ah, @user0721090601 has covered that. :) – raiph Oct 19 '20 at 15:52
  • 1
    @raiph: only an "issue" from my previously-uninformed perspective. Thank you both (yourself and @user0721090601): the motivation for requiring named `<>` is clear now. – grobber Oct 19 '20 at 18:04
1

You could do the following to expose the inner regex result to an outside variable:

my $rx = "(.*)a(.*)b(.*)";
my $result;

'xaybz' ~~ / $<result>=<$rx> {$result = $<result>}/;

say $result;

# OUTPUT:

# 「xaybz」
# 0 => 「x」
# 1 => 「y」
# 2 => 「z」
jakar
  • 1,701
  • 5
  • 14