10

I'm attempting to write a Perl 6 regex for this code-golf challenge that splits a string with the rules:

  • Sequences of the same character with length 3 or less will be grouped together
  • But 4 or more will result in the first two being grouped before checking the rest

For example:

66667888    -> '66', '66, '7', '888'
19999999179 -> '1', '99', '99', '999', '1', '7', '9'

I thought the regex m:g/(.)[$0$0<!$0>|$0?]/ would work, but using a capture in the negative lookahead seems to break it, and I can't figure out how to use it properly.

Depending on how I use it, it either loops forever, throws the error Cannot resolve caller INTERPOLATE_ASSERTION, or returns the wrong result. Is there a proper way to use captures in lookaheads, or is this a bug?

Elizabeth Mattijsen
  • 25,654
  • 3
  • 75
  • 105
Jo King
  • 590
  • 3
  • 17
  • Why does `6666` because `66` and `66` rather than `666` and `6`, given that your rule states that sequences of 3 or less will be grouped together? – Tim Biegeleisen Aug 16 '19 at 01:32
  • @TimBiegeleisen `But 4 or more will result in the first two being grouped before checking the rest`. So 4 in a row will be grouped as 2 and 2, not 3 and 1, while 5 will be 2 then 3. – Jo King Aug 16 '19 at 01:33

1 Answers1

6

According to Capturing section you need to use a code block to make these backreferences visible inside a regex:

These capture variables are only available outside the regex... In order to make them available inside the regex, you need to insert a code block behind the match; this code block may be empty if there's nothing meaningful to do

Use

given "19999999179" {
  for m:g/(.) {} :my $c = $0; ([ $c$c<!$c> | $c? ])/  -> $match {
    say ~$match;
  }
}

Results:

1
99
99
999
1
7
9

See the Perl6 demo.

In this scenario, you may contract the pattern to m:g/(.) {} [ $0$0<!$0> | $0? ]/:

my @matches;
given "19999999179" {
  for m:g/(.) {} [ $0$0<!$0> | $0? ]/ -> $match {
    @matches.push: (~$match);
  }
}
say @matches;

Results in [1 99 99 999 1 7 9].

See this Perl6 demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Ah interesting. For golfing purposes though, you can just insert the `{}` after the capture to force the assignment, as long as you're not doing another capture. [Demo](https://tio.run/##K0gtyjH7/z89syw1T0HJ0BICDM0tlRSquRQU0vKLFHKt0vU19DSra6NVDFQMbBRVDOxqVAzsY/UVFHTtFFRyE0uSM8CKFRSKEysV6iAi1kCBWq7a//8B) – Jo King Aug 16 '19 at 08:41
  • 2
    FYI, for golfing and perhaps a little more speed and readability you can simplify to `m:g/(.) $0? [ $0 {} <!$0> ]? /`, dropping a `$0` and a `|` alternation (a better golf), and putting the `{}` immediately before the construct that currently requires it (the `<!$0>`) for arguably better readability and perhaps a tiny speed up. – raiph Aug 17 '19 at 04:04
  • Yes, the original pattern can be translated (for the regex "debug" purpose) into [`(.)(?:\1\1(?!\1)|\1?)`](https://regex101.com/r/UoIy7N/2) Perl 5 regex, that really can be improved a little bit to [`(.)\1?(?:\1(?!\1))?`](https://regex101.com/r/UoIy7N/2) (PCRE test shows 6 steps less for the string in question). So, the `m:g/(.) $0? [ $0 {} <!$0> ]? /` regex is OK here. – Wiktor Stribiżew Aug 19 '19 at 07:45