Building a regex with sub in Perl 6

Question

After learning how to pass regexes as arguments, I've tried to build my first regex using a sub, and I'm stuck once more. Sorry for the complex rules below, I've made my best to simplify them. I need at least some clues how to approach this problem.

The regex should consist of alternations, each of them consisting of left, middle and right, where left and right should come in pairs and the variant of middle depends on which right is chosen.

An array of Pairs contains pairs of left and right:

my Pair @leftright =
  A => 'a',
  ...
  Z => 'z',
  ;

Middle variants are read from a hash:

my Regex %middle = 
  z => / foo /,
  a => / bar /,
  m => / twi /,
  r => / bin /,
  ...
  ;

%middle<z> should be chosen if right is z, %middle<a> — if right is a, etc.

So, the resulting regex should be

my token word {
    | A <%middle[a]> a
    | Z <%middle[z]> z
    | ...
}

or, more generally

my token word {
    | <left=@leftright[0].key> 
      <middle=%middle{@leftright[0].value}> 
      <right=@leftright[0].value> 
    | (the same for index == 1)
    | (the same for index == 2)
    | (the same for index == 3)
 ...
}

and it should match Abara and Zfooz.

How to build token word (which can be used e.g. in a grammar) with a sub that will take every pair from @leftright, put the suitable %middle{} depending on the value of right and then combine it all into one regex?

my Regex sub sub_word(Pair @l_r, Regex %m) {
...
}
my token word {
    <{sub_word(@leftright, %middle)}> 
}

After the match I need to know the values of left, middle, and right:

"Abara" ~~ &word;
say join '|', $<left>, $<middle>, $<right> # A|bar|a

It sounds like you're saying something to the effect that, given a right that's `X`, the middle is `/ waldo /`. But you haven't said what programmatic relationship is supposed to be detected between `X` and `waldo`. (Granted, you wrote `z` not `X` and `/ zfoo /` not `/ waldo /` but that makes no difference, unless you are meaning that the `z` in `/ zfoo /` isn't just to aid human understanding but is also to be detected by the program. In which case, no, I don't think you can do that -- I don't think your program can introspectively know that the `/ zfoo /` pattern contains a `z`.) — raiph, Nov 11 '17 at 03:25
What would be the input parameters to the sub that should build the token `word`? Should the sub be used before the regex parsing starts to build a predefined `token`, or should it be used within the regex parser? Can you give a simple example? — Håkon Hægland, Nov 11 '17 at 07:25
@raiph Thanks That's my fault (and I thought about it), I didn't mean introspection into a `regex`. I'll reformulate this part, making a `hash`. — Eugene Barsky, Nov 11 '17 at 07:35
@HåkonHægland The input parameters should be `@leftright` and `%middle`, and the pattern should be built before the parsing begins. If there were only one variant the pattern would be smth like `<$left> <$middle> <$right>`. Here it should be `<@leftright[0].key> <%middle{@leftright[0].value}> <@leftright[0].value> | (the same for index == 1 | then 2, 3 etc)...` So the problem is that I e.g. don't know how to concatenate `regexes` with alternation `|` in a loop. — Eugene Barsky, Nov 11 '17 at 07:46
I've reworked the text of the question. Hope, now it's better. — Eugene Barsky, Nov 11 '17 at 08:12
Your solution works with my data, but in future I need to do the same with `Regex middle`... — Eugene Barsky, Nov 11 '17 at 10:33

Håkon Hægland · Accepted Answer · 2017-11-11T09:30:18.807

2

I was not able to do this using token yet, but here is a solution with EVAL and Regex (and also I am using %middle as a hash of Str and not a hash of Regex):

my Regex sub build_pattern (%middle, @leftrigth) {
    my $str = join '|', @leftright.map(
        {join ' ',"\$<left>='{$_.key}'", "\$<middle>='{%middle{$_.value}}'", "\$<right>='{$_.value}'"});
    );
    my Regex $regex = "rx/$str/".EVAL;

    return $regex;
}

my Regex $pat = build_pattern(%middle, @leftright);

say $pat;
my $res = "Abara" ~~ $pat;
say $res;

Output:

rx/$<left>='A' $<middle>='bar' $<right>='a'|$<left>='Z' $<middle>='foo' $<right>='z'/
｢Abara｣
 left => ｢A｣
 middle => ｢bar｣
 right => ｢a｣

For more information on why I chose to use EVAL, see How can I interpolate a variable into a Perl 6 regex?

edited Nov 11 '17 at 09:30

answered Nov 11 '17 at 08:47

Håkon Hægland

39,012
21
81
174

Thanks! Probably, that will be the most effective way, but in my case I need to have the possibility to analyze `$/`. E.g. I need to know which `middle` matched. I'll add this to the Q, sorry I haven't written it beforehand. – Eugene Barsky Nov 11 '17 at 09:14
1

Then maybe put a capture group around the middle part? Like this: my $pat = `rx/'A' ('bar') 'a'|'Z' ('foo') 'z'/`. Then `$0` will contain the middle part after a successful match.. – Håkon Hægland Nov 11 '17 at 09:21
Is it possible to do the same with a named (not positional) capture? That would be a very nice solution for me. – Eugene Barsky Nov 11 '17 at 09:23
So, will the following be correct (modifying your A)? `""`, `""`, "" – Eugene Barsky Nov 11 '17 at 09:26
1

@EugeneBarsky Yes that is a good idea! See my updated answer for a suggestion – Håkon Hægland Nov 11 '17 at 09:34
That's what I needed to move forward!! Could you please explain the syntax? I mean `\` before `$`, `'..'` around `{}`, and `$=...` instead of ``? – Eugene Barsky Nov 11 '17 at 09:38
Sure :) I think the syntax I used with `$` is explained here: https://docs.perl6.org/language/regexes#Named_captures What other parts of the syntax did you refer to? – Håkon Hægland Nov 11 '17 at 09:52
I wanted to ask escaping `$` and taking `{$_.key}` in `'` quotes. I suppose both mean that these parts of the string are not to be interpolated during `EVAL`? – Eugene Barsky Nov 11 '17 at 10:16
1

@EugeneBarsky We need to escape the `$` because it is inside double quotes, and since we do not want it to be interpolated. I.e., the literal `$` must survive the first `join` call. For the other question: the single quotes are there since we are not using sigspace in the regex, hence all literal strings should be in quotes. If not, we would get a warning when compiling the regex (here that is: when using `EVAL`). – Håkon Hægland Nov 11 '17 at 13:26
Thanks, that's what I supposed but wasn't sure. A wonderful technique! – Eugene Barsky Nov 11 '17 at 14:12
@HåkonHægland "I was not able to do this using `token` yet" Just add a [`:r`](https://docs.perl6.org/language/regexes#Ratchet) as the first thing in the regex, then it'll be exactly equivalent to a `token`. – raiph Nov 12 '17 at 00:38

Building a regex with sub in Perl 6

1 Answers1