I'm writing a regular expression in perl to match perl code that starts the definition of a perl subroutine. Here's my regular expression:
my $regex = '\s*sub\s+([a-zA-Z_]\w*)(\s*#.*\n)*\s*\{';
$regex matches code that starts a subroutine. I'm also trying to capture the name of the subroutine in $1 and any white space and comments between the subroutine name and the initial open brace in $2. It's $2 that is giving me a problem.
Consider the following perl code:
my $x = 1;
sub zz
# This is comment 1.
# This is comment 2.
# This is comment 3.
{
$x = 2;
return;
}
When I put this perl code into a string and match it against $regex, $2 is "# This is comment 3.\n", not the three lines of comments that I want. I thought the regular expression would greedily put all three lines of comments into $2, but that seems not to be the case.
I would like to understand why $regex isn't working and to design a simple replacement. As the program below shows, I have a more complex replacement ($re3) that works. But I think it's important for me to understand why $regex doesn't work.
use strict;
use English;
my $code_string = <<END_CODE;
my \$x = 1;
sub zz
# This is comment 1.
# This is comment 2.
# This is comment 3.
{
\$x = 2;
return;
}
END_CODE
my $re1 = '\s*sub\s+([a-zA-Z_]\w*)(\s*#.*\n)*\s*\{';
my $re2 = '\s*sub\s+([a-zA-Z_]\w*)(\s*#.*\n){0,}\s*\{';
my $re3 = '\s*sub\s+([a-zA-Z_]\w*)((\s*#.*\n)+)?\s*\{';
print "\$code_string is '$code_string'\n";
if ($code_string =~ /$re1/) {print "For '$re1', \$2 is '$2'\n";}
if ($code_string =~ /$re2/) {print "For '$re2', \$2 is '$2'\n";}
if ($code_string =~ /$re3/) {print "For '$re3', \$2 is '$2'\n";}
exit 0;
__END__
The output of the perl script above is the following:
$code_string is 'my $x = 1;
sub zz
# This is comment 1.
# This is comment 2.
# This is comment 3.
{
$x = 2;
return;
} # sub zz
'
For '\s*sub\s+([a-zA-Z_]\w*)(\s*#.*\n)*\s*\{', $2 is '# This is comment 3.
'
For '\s*sub\s+([a-zA-Z_]\w*)(\s*#.*\n){0,}\s*\{', $2 is '# This is comment 3.
'
For '\s*sub\s+([a-zA-Z_]\w*)((\s*#.*\n)+)?\s*\{', $2 is '
# This is comment 1.
# This is comment 2.
# This is comment 3.
'