0

Can anyone explain regular expression text substitutions when the regular expression is held in a variable? I'm trying to process some text, Clearcase config specs actually, and substitute text as I go. The rules for the substitution are held in an array of hashes that have the regular expression to match and the text to substitute.

The input text looks somthing like this:

element  /my_elem/releases/...  VERSION_STRING.020 -nocheckout

Most of the substitutions are simply to remove lines that contain a specific text string, this works fine. In some cases I want to substitute the text, but re-use the VERSION_STRING text. I've tried using $1 in the substitution expression but it doesn't work. $1 gets the version string in the match, but the replacement of $1 doesn't work in the substitution.

In these cases the output should look something like this:

element  -directory  /my_elem/releases/... VERSION_STRING.020 -nocheckout
element  /my_elem/releases/.../*.[ch]  VERSION_STRING.020 -nocheckout

ie. One line input became two output and the version string has been re-used.

The code looks something like this. First the regular expressions and substitutions:

my @Special_Regex = (   
                  { regex => "\\s*element\\s*\/my_elem_removed\\s*\/main\/\\d+\$",                  subs => "# Line removed" },
                  { regex => "\\s*element\\s*\/my_elem_changed\/releases\/\.\.\.\\s*\(\.\*\$\)", 
                    subs => "element  \-directory  \/my_elem\/releases\/\.\.\. \\1\nelement  \/my_elem\/releases\/\.\.\.\/\*\.\[ch\]  \\1" }

                );

In the second regex the variable $1 is defined in the portion (.*\$) and this is working correctly. The subs expression does not substitute it, however.

 foreach my $line (<INFILE>)
        {
        chomp($line);
        my $test = $line;
        foreach my $hash (@Special_Regex)
        {
            my $regex = qr/$hash->{regex}/is;
            if($test =~ s/$regex/$hash->{subs}/)
                {
                print "$test\n";
                print "$line\n";
                print "$1\n";
                }
         }
}

What am I missing? Thanks in advance.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
0xDEADBEEF
  • 808
  • 1
  • 11
  • 19
  • 3
    Don't use ddoouubbllee slackbashed strings for regexes, then compile it all those times. Just make the hash values `qr//` string directly. Don't use `\\1` on the RHS of substitutions! And please get rid of those ugly LTS strings. – tchrist Nov 03 '10 at 17:07
  • 2
    I am sure someone will be willing to read the post. In the mean time, please do yourself and anyone who has to read to code a favor and lookup `\Q` in `perldoc perlreref`. – Sinan Ünür Nov 03 '10 at 17:08
  • Fair comment. This code has been through several revisions while I've been experimenting - I removed the qr to have control over what was escaped and what was not. You can trust me that the regexps work except for the $1 \1 substitutions. – 0xDEADBEEF Nov 03 '10 at 17:09

2 Answers2

3

The substitution string in your regex is only getting evaluated once, which transforms $hash->{subs} into its string. You need to evaluate it again to interpolate its internal variables. You can add the e modifier to the end of the regex which tells Perl to run the substitution through eval which can perform the second interpolation among other things. You can apply multiple e flags to evaluate more than once (if you have a problem that needs it). As tchrist helpfully points out, in this case, you need ee since the first eval will just expand the variable, the second is needed to expand the variables in the expansion.

You can find more detail in perlop about the s operator.

daxim
  • 39,270
  • 4
  • 65
  • 132
Eric Strom
  • 39,821
  • 2
  • 80
  • 152
  • Eric, note that having a RHS on a substitute be `$foo` is the same with and without `/e`, which is why that sort of thing always requires a `/ee` instead. – tchrist Nov 03 '10 at 17:38
2

There is no compilation for a replace expression. So about the only thing you can do is exec or eval it with the e flag:

if($test =~ s/$regex/eval qq["$hash->{subs}"]/e ) { #...

worked for me after changing \\1 to \$1 in the replacement strings.

s/$regex/$hash->{subs}/

only replaces the matched part with the literal value stored in $hash->{subs} as the complete substitution. In order to get the substitution working, you have to force Perl to evaluate the string as a string, so that means you even have to add the dquotes back in in order to get the interpolating behavior you are looking for (because they are not part of the string.)

But that's kind of clumsy, so I changed the replace expressions into subs:

my @Special_Regex 
    = ( 
        { regex => qr{\s*element\s+/my_elem_removed\s*/main/\d+$}
        , subs  => sub { '#Line removed' }
        }
    ,   { regex => qr{\s*element\s+/my_elem_changed/releases/\.\.\.\s*(.*$)}
        , subs  => sub { 
            return "element  -directory  /my_elem/releases/... $1\n"
                 . "element  /my_elem/releases/.../*.[ch]  $1"
                 ; 
          }
        }

    );

I got rid of a bunch of stuff that you don't have to escape in a substitution expression. Since what you want to do is interpolate the value of $1 into the replacement string, the subroutine does simply that. And because $1 will be visible until something else is matched, it will be the right value when we run this code.

So now the replacement looks like:

s/$regex/$hash->{subs}->()/e

Of course making it pass $1 makes it a little more bulletproof, because you're not depending on the global $1:

s/$regex/$hash->{subs}->( $1 )/e

Of course, you would change the sub like so:

subs => sub {
    my $c1 = shift;
    return "element  -directory  /my_elem/releases/... $c1\n"
         . "element  /my_elem/releases/.../*.[ch]  $c1"
         ; 
}

Just one last note: "\.\.\." didn't do what you think it did. You just ended up with '...' in the regex, which matches any three characters.

Axeman
  • 29,660
  • 2
  • 47
  • 102
  • Many thanks for the answer - both versions worked well, and your answer is very informative. I have gone with your slightly more elegant 'sub' version, but without parameters in case a regular expression has more than one matching variable. Just one minor point - isn't the substitiution regex is missing the final evaluation e? s/$regex/$hash->{subs}->()/e works for me. – 0xDEADBEEF Nov 04 '10 at 07:53