12

I've been poring over perldoc perlre as well as the Regular Expressions Cookbook and related questions on Stack Overflow and I can't seem to find what appears to be a very useful expression: how do I know the number of current match?

There are expressions for the last closed group match ($^N), contents of match 3 (\g{3} if I understood the docs correctly), $', $& and $`. But there doesn't seem to be a variable I can use that simply tells me what the number of the current match is.

Is it really missing? If so, is there any explained technical reason why it is a hard thing to implement, or am I just not reading the perldoc carefully enough?

Please note that I'm interested in a built-in variable, NOT workarounds like using (${$count++}).

For context, I'm trying to build a regular expression that would match only some instances of a match (e.g. match all occurrences of character "E" but do NOT match occurrences 3, 7 and 10 where 3, 7 and 10 are simply numbers in an array). I ran into this when trying to construct a more idiomatic answer to this SO question.

I want to avoid evaluating regexes as strings to actually insert 3, 7 and 10 into the regex itself.

Community
  • 1
  • 1
DVK
  • 126,886
  • 32
  • 213
  • 327
  • Please note that I need the # of matches, not # of captured groups. – DVK Aug 11 '12 at 15:22
  • `Please note that I'm interested in a built-in variable` : If it's not in perldoc perlvar, does it exist? I've assumed that perlvar contains *all* perl built-in variables. – TLP Aug 11 '12 at 17:51
  • 1
    There is no such variable. perlvar doesn't document all built-in variables - for instance `@ISA` doesn't appear - but everything is documented somewhere. Perl tends not to have hidden functionality. Can you give an example of the problem you are trying to solve that is better than the reference you give? – Borodin Aug 11 '12 at 19:58
  • Why counting matches is a "workaround"? Who or what exactly is preventing you to use it? – Oleg V. Volkov Aug 11 '12 at 21:50
  • @TLP - perlre is a bit of a convoluted read, by necessity. I'm not at all convinced that it doesn't exist based on **my** reading of it (vs. that it exists and I missed the reference) – DVK Aug 11 '12 at 23:56
  • @Borodin - I'm unsure of how my example of the problem is deficient, so I'd need to know why you don't find it acceptable to come up with a "better" one. I was typing up a question regarding why my solution to the problem didn't work, but SO for some reason crashed mid-day and ate my 30 minute effort before I could post. – DVK Aug 11 '12 at 23:59
  • @DVK: OK, suppose there was a variable, `${^MATCH_COUNT}`, that did what you wanted, please show an example of how you would use it – Borodin Aug 12 '12 at 02:31
  • @Borodin - see second to last paragraph of the question. A better example is likely forthcoming in my next question assuming I can't figure out the issue myself – DVK Aug 12 '12 at 02:57
  • Code assertions are "built-in", and avoiding them is complication for the sake of complication. – hobbs Aug 12 '12 at 17:05
  • Have you played with the `Regexp::Debugger` module? – tchrist Aug 13 '12 at 00:57
  • @tchrist - my main problem **seemed** to be with scoping, so I didn't try that yet. Great idea overall though. – DVK Aug 13 '12 at 10:22

2 Answers2

6

I'm completely ignoring the actually utility or wisdom of using this for the other question.

I thought @- or @+ might do what you want since they hold the offsets of the numbered matches, but it looks like the regex engine already knows what the last index will be:

use v5.14;

use Data::Printer;

$_ = 'abc123abc345abc765abc987abc123';

my @matches = m/
    ([0-9]+)
    (?{ 
        print 'Matched \$' . $#+ . " group with $^N\n";
        say p(@+);
    })
    .*?
    ([0-9]+)
    (?{ 
        print 'Matched \$' . $#+ . " group with $^N\n"; 
        say p(@+);
    })  
    /x;

say "Matches: @matches";

This gives strings that show the last index as 2 even though it hasn't matched $2 yet.

Matched \$2 group with 123
[
    [0] 6,
    [1] 6,
    [2] undef
]
Matched \$2 group with 345
[
    [0] 12,
    [1] 6,
    [2] 12
]
Matches: 123 345

Notice that the first time around, $+[2] is undef, so that one hasn't been filled in yet. You might be able to do something with that, but I think that's probably getting away from the spirit of your question. If you were really fancy, you could create a tied scalar that has the value of the last defined index in @+, I guess.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
5

I played around with this for a bit. Again, I know that this is not really what you are looking for, but I don't think that exists in the way you want it.

I had two thoughts. First, with a split using separator retention mode, you get the interstitial bits as the odd numbered elements in the output list. With the list from the split, you count which match you are on and put it back together how you like:

use v5.14;

$_ = 'ab1cdef2gh3ij4k5lmn6op7qr8stu9vw10xyz';

my @bits = split /(\d+)/; # separator retention mode

my @skips = qw(3 7 10);
my $s;
while( my( $index, $value ) = each @bits ) {
    # shift indices to match number ( index = 2 n - 1 )
    if( $index % 2 and ! ( ( $index + 1 )/2 ~~ @skips ) ) {
        $s .= '^';
        }
    else {
        $s .= $value;
        }
    }

I get:

ab^cdef^gh3ij^k^lmn^op7qr^stu^vw10xyz

I thought I really liked my split answer until I had the second thought. Does state work inside a substitution? It appears that it does:

use v5.14;
$_ = 'ab1cdef2gh3ij4k5lmn6op7qr8stu9vw10xyz';
my @skips = qw(3 7 10);

s/(\d+)/
    state $n = 0;
    $n++;
    $n ~~ @skips ? $1 : '$'
    /eg;

say;

This gives me:

    ab$cdef$gh3ij$k$lmn$op7qr$stu$vw10xyz

I don't think you can get much simpler than that, even if that magic variable existed.

I had a third thought which I didn't try. I wonder if state works inside a code assertion. It might, but then I'd have to figure out how to use one of those to make a match fail, which really means it has to skip over the bit that might have matched. That seems really complicated, which is probably what Borodin was pressuring you to show even in pseudocode.

brian d foy
  • 129,424
  • 31
  • 207
  • 592