2

How can I access capture buffers in brackets with quantifiers?

#!/usr/local/bin/perl
use warnings;
use 5.014;

my $string = '12 34 56 78 90';

say $string =~ s/(?:(\S+)\s){2}/$1,$2,/r;
# Use of uninitialized value $2 in concatenation (.) or string at ./so.pl line 7.                                                           
# 34,,56 78 90 

With @LAST_MATCH_START and @LAST_MATCH_END it works*, but the line gets too long. Doesn't work, look at TLP's answer.

*The proof of the pudding is in the eating isn't always right.

say $string =~ s/(?:(\S+)\s){2}/substr( $string, $-[0], length($-[0]-$+[0]) ) . ',' . substr( $string, $-[1], length($-[1]-$+[1]) ) . ','/re;
# 12,34,56 78 90
sid_com
  • 24,137
  • 26
  • 96
  • 187
  • 1
    What does "But the line gets to long" mean? – Howard Jul 08 '11 at 11:57
  • Is this an experiment to learn about capture groups and regexes in perl, or do you have an actual problem to solve? If the latter, I think your solution is probably not a good one. – TLP Jul 08 '11 at 14:03
  • And btw, `length($-[0]-$+[0])` is an odd statement, since `$-[0]` and `$+[0]` will be two numbers, denoting the offset in the string where the first match (`$1`) starts and ends. It will always return the same number, which is the length of the number of characters matched by the regex, e.g. `length(4 - 7)` will return 1. `length(44 - 47)` will return 1. – TLP Jul 08 '11 at 14:23
  • My bad, since it's a negative number it will return 2. `length(-3)` returns 2. – TLP Jul 08 '11 at 14:44
  • @Howard: The code-line is too long; it's for an one-liner. – sid_com Jul 08 '11 at 16:44

2 Answers2

4

You can't access all previous values of the first capturing group, only the last value (or the current at the match end, as you can see it) will be saved in $1 (unless you want to use a (?{ code }) hack).

For your example you could use something like:

s/(\S+)\s+(\S+)\s+/$1,$2,/
Qtax
  • 33,241
  • 9
  • 83
  • 121
1

The statement that you say "works" has a bug in it.

length($-[0]-$+[0]) 

Will always return the length of the negative length of your regex match. The numbers $-[0] and $+[0] are the offset of the start and end of the first match in the string, respectively. Since the match is three characters long (in this case), the start minus end offset will always be -3, and length(-3) will always be 2.

So, what you are doing is taking the first two characters of the match 12 34, and the first two characters of the match 34 and concatenating them with a comma in the middle. It works by coincidence, not because of capture groups.

It sounds as though you are asking us to solve the problems you have with your solution, rather than asking us about the main problem.

TLP
  • 66,756
  • 10
  • 92
  • 149