why do @- and @+ have different sizes after perl regex match

Question

I expected @- and @+ to be the same size after a successful match, but they aren't. For example this script:

 #!/usr/bin/perl -w

 use strict;
 use warnings;

 use v5.20.0;

 use Data::Dumper;
 $Data::Dumper::Terse = 1;

 my $rex = '([a-z]+) | ([0-9]+) | ([A-Z]+)';

 my $str = '9999';

 if ( $str =~ m/$rex/x ) {
     say Dumper(\@{^CAPTURE});
     say Dumper(\@-);
     say Dumper(\@+);
 }

produces this output:

 [
   undef,
   '9999'
 ]

 [
   0,
   undef,
   0
 ]

 [
   4,
   undef,
   4,
   undef
 ]

It looks like @- doesn't include undef for trailing unmatched groups, while @+ does. Why the difference?

On the way to an alternative, does this pass for all $str and $rex:

 if ( $str =~ m/$rex/x ) {
      scalar(@{[$&, @{^CAPTURE}]}) == scalar(@-) or die;
 }

Rocco · Answer 1 · 2023-03-17T17:51:49.430

1

This is exactly what perl does. In this case, we can use $#- to find the last matched subgroup in the last successful match, and, we can use $#+ to determine how many subgroups were in the regular expression of last successful match. That is, $n is defined if $-[n] is defined. So if you expect that @- and @+ must have the same size after a successful match, this cannot be achieved, but you can just use the @- and @+[0..$#-]. [ Ref : @-, @+, $+ ]

edited Mar 17 '23 at 17:51

answered Mar 16 '23 at 17:35

Rocco

471
1
3
7

1

The first paragraph of the docs for the @- variable should be changed because it says "$-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match" (which isn't true). The subsequent paragraph does clarify the situation, but it's bad docs to have first paragraph say untrue thing. – Britton Kerin Mar 16 '23 at 18:12
Re "*This is exactly what perl was designed to do.*", What's your basis for this? The docs for @+ and @- both refer to "last successful submatches", so you didn't get that from the docs. – ikegami Mar 17 '23 at 15:00
@Britton Kerin, There's nothing untrue. In your example, `$-[3]` is `undef`. – ikegami Mar 17 '23 at 15:02
@ikegami To get the least controversial answer, we can refer to the `Perl` [source code](https://github.com/Perl/perl5/blob/8552f09f5cfe61a536a65f11290ef026f7aa0356/mg.c#L628-L662). – Rocco Mar 17 '23 at 15:34
@ikegami Re "*There's nothing untrue. In your example, `$-[3]` is `undef`.*" : No, this is exactly the difference between array `(undef)` and `()`. Also, note the difference between the last successful **match** and the last successful **submatch**. – Rocco Mar 17 '23 at 16:10
@Rocco, The array might not have four elements, but the docs doesn't say it has four element. It says `$-[3]` should be `undef`. And `$-[3]` is indeed `undef`. So I repeat myself, there's nothing in the documentation that's untrue. – ikegami Mar 17 '23 at 16:27
@Rocco, Re "*Also, note the difference between the last successful match and the last successful submatch.*", I'm aware. And the docs for both talks about substring/subpattern/submatch. Like I said. So I don't understand your comment. – ikegami Mar 17 '23 at 16:27
@Rocco, Re "*To get the least controversial answer, we can refer to the Perl source code.*", There's no controversy about what Perl does. We all agree on that. At issue is your claim that this was by design. I asked for the basis of this claim, as it sounds like you made it up. What's your source? – ikegami Mar 17 '23 at 16:34
@ikegami Re "*The array might not have four elements, but the docs doesn't say it has four element. It says `$-[3]` should be `undef`. And `$-[3]` is indeed `undef`. So I repeat myself, there's nothing in the documentation that's untrue.*" : I agree with you, my intention is only the difference in the number of elements, sorry. – Rocco Mar 17 '23 at 17:53
@ikegami Re "*Re "Also, note the difference between the last successful match and the last successful submatch.", I'm aware. And the docs for both talks about substring/subpattern/submatch. Like I said. So I don't understand your comment.*" : Sorry for the vague comment. I thought you didn't understand me correctly, so I'm explaining my answer because I'm both using the last successful **match**. – Rocco Mar 17 '23 at 17:54
@ikegami Re "*There's no controversy about what Perl does. We all agree on that. At issue is your claim that this was by design. I asked for the basis of this claim, as it sounds like you made it up. What's your source?*" : Yeah, I just simply talked about how perl does it, rather than explaining the underlying reasons why it is so designed. My statement may be inaccurate, I have revised the answer, thanks for your review. – Rocco Mar 17 '23 at 17:55
Re "*rather than explaining the underlying reasons why*", But that's the question. the OP asked X_X. Explaining what it does is already done by the question. so that's not useful! – ikegami Mar 17 '23 at 18:34
The sentence is at best ambiguous because unmatched gups that precede matched ones do get an undef, and it's therefore reasonable to expect later ones will also. There's no reason not to word it better. – Britton Kerin Mar 18 '23 at 18:32
Except you do get an `undef` from `$-[3]`. There's nothing ambiguous about it. You just made assumptions about the number of elements in `@-`, on which the documentation is silent. The non-existent statement about the number of elements `@-` can't possibly be ambiguous. – ikegami Mar 18 '23 at 21:41
There are two different behaviors going on and the docs lump them into a single description that doesn't discriminate. That's ambiguity. – Britton Kerin Mar 20 '23 at 03:49

why do @- and @+ have different sizes after perl regex match

1 Answers1