How to put a sub inside a regex in Perl 6?

Question

That's what I'm trying to do.

>  my sub nplus1($n) {$n +1}
> my regex nnplus1 { ^ (\d+) &nplus1($0) $ }
> "123" ~~ &nnplus1
P6opaque: no such attribute '$!pos' in type Match...

You can try put the sub in a code block `my regex nnplus1 { (\d+) { &nplus1($0)} }` but it will not change the result from 123 to 124.. I am not sure yet how to do that. What would be your expected output/result? — Håkon Hægland, Nov 10 '17 at 09:01
This one is buckets of fun, including behavior that I think should be considered a bug (but isn't). I'm experimenting with it now. — piojo, Nov 10 '17 at 09:02
I would expect it to match "123124". I've added anchors to the regex. — Eugene Barsky, Nov 10 '17 at 09:06
@EugeneBarsky The correct version does, without needing any extra modification. — piojo, Nov 10 '17 at 09:14
@EugeneBarsky I found YAMLish to be a great example of advanced grammar/regex techniques. The rakudo source code itself has another--there's one file which parses Perl 6. YAMLish is online, but Leon has given a talk on it which I found gave me the ideas I needed for my project. You can find it online if you search for it. — piojo, Nov 10 '17 at 09:18

piojo · Answer 1 · 2017-11-10T10:31:42.163

4

Keep in mind that regexes are subs. So don't call your matcher a sub—be more specific and call it a regex. Yes, you can pass arguments to regex/token/rule. It's really important to do this when you match languages that change their state as you parse. For example, in YAML, you can parse "data[0]: 17". After that, the next line can start with "data[1]" but not "data[2]". So passing extra info as parameters is useful.

Also note that when you convert this to a regex, some things change. $n+1 will take on a new meaning (which is wrong). However, simple variables are still interpolated, so if you declare it as a new variable within the regex body with :my $npp = .... But even then, you'll find it still doesn't work. When you add a helper statement like {say "n is $n"}, you'll see you're not getting passed a valid parameter. This is because in code-like contexts without braces (when you use an expression as an argument to another matcher), rakudo does not update the match variable. When you add braces, the current match variable is recomputed or re-cached. This hack looks like a typo, so I suggest you add a comment that explains the empty braces. The final code is this:

my regex nplus1($n) {
 :my $npp=$n+1;
 $npp
}
my regex nnplus1 { (\d+) {} <nplus1($0)> }
say "123124" ~~ &nnplus1;

In this case (which is basically recursion), I like to keep things neater by changing data in the arguments instead of changing data in the function body: <nplus1($0+1)> instead of defining :my $npp = $n+1;.

edited Nov 10 '17 at 10:31

answered Nov 10 '17 at 09:13

piojo

6,351
1
26
36

I've just tested it, and it seems not to work (or rather I don't understand something important). In both matches the first capture takes the whole string, and `nplus1 => ｢｣`. – Eugene Barsky Nov 10 '17 at 10:24
@EugeneBarsky I see. Yeah, that seems weird. I think the `$0` should be 123 and `$` should be 124. Agreed? I'll see if I can make that work... – piojo Nov 10 '17 at 10:26
Yes, that's what I wanted. And with 123 string `$0` should be 1 and `nplus1($0)` should be 2 (if we don't add anchors). – Eugene Barsky Nov 10 '17 at 10:28
1

@EugeneBarsky I made a big mistake the first time, but with a couple simple changes it works. The bit about interpolating strings with `{}` within a regex is wrong, at least in this situation. See the updated answer, but you need to put `$n+1` in a new variable so you can interpolate that into the regex. – piojo Nov 10 '17 at 10:32
Thanks, it works! But what if I want to make the sub (regex) more complex? E.g. `return ($term = $n) ~~ s/3/4/`. So how to make the same (if possible) with a 'full' sub. – Eugene Barsky Nov 10 '17 at 10:50
Seems I've finally managed to do it with sub (not regex). I'll make a couple of test and then I'll post it. – Eugene Barsky Nov 10 '17 at 11:01
1

@EugeneBarsky It may be a philosophy difference, in addition to syntax. One might argue that it would be better to use a helper `sub` to do the logic and a `regex` to do the final match. But more experience is probably needed with grammars to know what best practices are. – piojo Nov 10 '17 at 13:03
Yes, I feel the same. So, I'm experimenting with grammars, solving simplified problems (taking parts of my real-life data) using different approaches. Hope, some time I'll be able to do it with better understanding. – Eugene Barsky Nov 10 '17 at 13:07

Eugene Barsky · Answer 2 · 2017-11-10T12:28:59.077

3

Based on the Regex interpolation docs as well as on piojo's answer and Håkon Hægland's comment, it seems I've managed to do what I wanted:

my sub nplus1($n) {
 $n+1;
}
my regex nnplus1 { (\d+) {} <nplus1=$(nplus1($0))> }
say "123124" ~~ &nnplus1;

Output:

｢123124｣
 0 => ｢123｣
 nplus1 => ｢124｣

Or we can move the {} to enclose the interpolated sub:

my sub nplus1($n) {
 $n+1;
}
my regex nnplus1 { (\d+)  <nplus1={nplus1($0)}> }
say "123124" ~~ &nnplus1;

(the output will be the same)

edited Nov 10 '17 at 12:28

answered Nov 10 '17 at 11:19

Eugene Barsky

5,780
3
17
40

That's interesting. On one hand, I don't see why you would do it that way. That's probably semantically very similar to my answer, though this is syntax I would not have thought to try. On the other hand, I can't see any reason not to do it this way. I guess it depends on the contents of `nplus1`. Does that function do real computation, setting state and causing side effects? If so, then yes, it makes sense to call it a `sub`. In my case, when I needed matchers that accepted parameters, the matcher usually just needed to accept a different length of indentation, hence I called it a sub. – piojo Nov 10 '17 at 12:15
By the way, I think you have an extra layer of `$()` in the interpolation that doesn't seem to do anything. – piojo Nov 10 '17 at 12:15
@piojo I haven't decided yet how to use it in my real programs, so I'm trying different possibilities to learn the syntax. I made the sample code with numbers, since it's more easy to write. In reality, I'll have to deal mostly with strings. As for an extra layer of `$()`, I just wanted to ask, if there were ways to simplify my code. So, how should I write it without additional `$()`? – Eugene Barsky Nov 10 '17 at 12:21
1

I think `` gives the same result as ``. – piojo Nov 10 '17 at 12:26
One of my goals is to find the best way to combine two patterns, that cannot be represented as a combination (concatenation) of segments. I. e. given consonants `$c1, $c2, $c3` and vowels `$v1, $v2` I have to make a pattern `$c1$v1$c2$v2$c3`. So with known vowels 'a' and 'e' and unknown consonants the pattern will become ` a e `. – Eugene Barsky Nov 10 '17 at 12:27
Yes, taking away `$()` gives the same result. I'll fix my answer. – Eugene Barsky Nov 10 '17 at 12:28
1

Another goal is matching ` `, where `pattern2` depends on the exact match of `pattern1`. This dependence may be rather complicated, so it would be inconvenient to write it in the regex. – Eugene Barsky Nov 10 '17 at 12:48
@EugeneBarsky Maybe one of the ["and in between" regex operators (`%` or `%%`)](https://docs.perl6.org/language/regexes#Modified_quantifier:_%,_%%) are of interest? eg `say 'foo BAR baz QUX waldo'.match: rule { ^ [<:Ll>+] + % <:Lu>+ $ }`. You can definitely get the behavior/matching of a rule/pattern to depend on what another pattern earlier matched. There are known caching bugs but here's an example that at least illustrates how things are supposed to work: `say 'foo FOO baz FOOBAZ waldo'.match: rule { ^ ([<:Ll>+]) + % <{$0>>.uc.join: ''}> $ }`. Lots going on there of course but hth, – raiph Nov 10 '17 at 18:50
@raiph That's very useful. I read about `%` and `%%`, but I didn't realize that the 2nd operand can also be a regex! Here I don't fully understand how the `rule` works — I thought spaces are not allowed inside a rule? – Eugene Barsky Nov 10 '17 at 19:16
1

You've seriously misunderstood `rule`! In rules declared with the `regex` or `token` declarators, spaces (unquoted ones) are not significant. Put 'em in, take 'em out, the matching stays the same. So `regex { foobar }` and `token { fo ob ar }` match exactly the same strings. In contrast, in rules declared with `rule`, whitespace after an atom is significant. In fact this is the only way a `rule` differs from a `token`. Whitespace after an atom in a `rule` is automatically replaced with a `<.ws>` which matches zero or more whitespace characters. – raiph Nov 10 '17 at 19:46
Do I understand correctly, that here it's the whitespace between `([<:Ll>+])` and `+`? – Eugene Barsky Nov 10 '17 at 19:52

moritz · Accepted Answer · 2017-11-13T12:25:14.347

3

The <{...}> construct runs Perl 6 code inside a regex, and evaluates the result as a regex:

my sub nplus1($n) {$n +1} my regex nnplus1 { ^ (\d+) <{ nplus1($0) }> $ } say so '23' ~~ &nnplus1; # Output: True say so '22' ~~ &nnplus1; # Output: False

edited Nov 13 '17 at 12:25

answered Nov 13 '17 at 09:05

moritz

12,710
1
41
63

How to put a sub inside a regex in Perl 6?

3 Answers3

Linked