24

I need to pass a regex substitution as a variable:

sub proc {
    my $pattern = shift;
    my $txt = "foo baz";

    $txt =~ $pattern;
}

my $pattern = 's/foo/bar/';
proc($pattern);

This, of course, doesn't work. I tried eval'ing the substitution:

eval("$txt =~ $pattern;");

but that didn't work either. What horribly obvious thing am I missing here?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ceo
  • 1,138
  • 1
  • 8
  • 16

9 Answers9

32

I need to pass a regex substitution as a variable

Do you? Why not pass a code reference? Example:

sub modify
{
  my($text, $code) = @_;
  $code->($text);
  return $text;
}

my $new_text = modify('foo baz', sub { $_[0] =~ s/foo/bar/ });

In general, when you want to pass "something that does something" to a subroutine ("a regex substitution" in the case of your question) the answer is to pass a reference to a piece of code. Higher Order Perl is a good book on the topic.

John Siracusa
  • 14,971
  • 7
  • 42
  • 54
  • 3
    This worked, and is closest to what I had in mind. However, the resulting code is a bit funky and convoluted for my tastes, which I generally take as a hint that it's time to rethink my overall approach. – ceo Sep 24 '08 at 16:36
  • I'm new to Perl, so could you please explain what does the operator `->` do? What is the purpose of `$code->($text)`? Thanks. – roxrook Apr 25 '13 at 00:46
  • The `->()` part dereferences $code as a code reference and executes it, passing the contents of the `()` as arguments. More here: http://perldoc.perl.org/perlref.html#Using-References – John Siracusa Apr 25 '13 at 19:05
8

Well, you can precompile the regular expression using the qr// operator. But you can't pass an operator (s///).

$pattern = qr/foo/;

print "match!\n" if $text =~ $pattern;

But if you have to pass the substitution operator, you are down to passing either code or strings:

proc('$text =~ s/foo/bar');

sub proc {
   my $code = shift;

   ...

   eval $code;
}

or, code:

proc(sub {my $text = shift;  $text =~ s/foo/bar});

sub proc {
   my $code = shift;

   ...

   $code->("some text");
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
zigdon
  • 14,573
  • 6
  • 35
  • 54
8
sub proc {
    my($match, $subst) = @_;
    my $txt = "foo baz";
    $txt =~ s/$match/$subst/;
    print "$txt\n";
}

my $matcher = qr/foo/;
my $sub_str = "bar";

proc($matcher, $sub_str);

This rather directly answers your question. You can do more - but when I used a qr// term instead of the $sub_str as a simple literal, then the expanded regex was substituted.

I recently needed to create a parser (test parser) for statements with some peculiar (dialect of) SQL types, recognizing lines such as this, splitting it into three type names:

input: datetime year to second,decimal(16,6), integer

The script I used to demo this used quoted regexes.

#!/bin/perl -w
use strict;
while (<>)
{
    chomp;
    print "Read: <$_>\n";
    my($r1) = qr%^input\s*:\s*%i;
    if ($_ =~ $r1)
    {
        print "Found input:\n";
        s%$r1%%;
        print "Residue: <$_>\n";
        my($r3) = qr%(?:year|month|day|hour|minute|second|fraction(?:\([1-5]\))?)%;
        my($r2) = qr%
                        (?:\s*,?\s*)?   # Commas and spaces
                        (
                            (?:money|numeric|decimal)(?:\(\d+(?:,\d+)?\))?   |
                            int(?:eger)?  |
                            smallint      |
                            datetime\s+$r3\s+to\s+$r3
                        )
                    %ix;
        while ($_ =~ m/$r2/)
        {
            print "Got type: <$1>\n";
            s/$r2//;
        }
        print "Residue 2: <$_>\n";
    }
    else
    {
        print "No match:\n";
    }
    print "Next?\n";
}

We can argue about the use of names like $r1, etc. But it did the job...it was not, and is not, production code.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
6

s/// is not a regex. Thus, you can't pass it as a regex.

I don't like eval for this. It's very fragile, with a lot of bordercases.

I think it's best to take an approach similar to the one JavaScript takes: pass both a regex (in Perl, that is qr//) and a code reference for the substitution. For example, to pass parameters to get the same effect as

s/(\w+)/\u\L$1/g;

You can call

replace($string, qr/(\w+)/, sub { "\u\L$1" }, 'g');

Note that the 'g' modifier is not actually a flag for the regex (I think attaching it to the regex is a design mistake in JavaScript), so I chose to pass it in a third parameter.

Once the API has been decided on, the implementation can be done next:

sub replace {
    my($string, $find, $replace, $global) = @_;
    unless($global) {
        $string =~ s($find){ $replace->() }e;
    } else {
        $string =~ s($find){ $replace->() }ge;
    }
    return $string;
}

Let's try it:

print replace('content-TYPE', qr/(\w+)/, sub { "\u\L$1" }, 'g');

Result:

Content-Type

That looks good to me.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
bart
  • 7,640
  • 3
  • 33
  • 40
  • 1
    Data::Munge on CPAN has a 'replace' function similar to this, though it passes substring matches to the fuction, or parses the replacement as a string. – MkV Oct 29 '10 at 19:38
5
eval "$txt =~ $pattern";

This becomes

eval "\"foo baz\" =~ s/foo/bar/"

and substitutions don't work on literal strings.

This would work:

eval "\$txt =~ $pattern"

but that's not very pleasing. eval is almost never the right solution.

zigdon's solution can do anything, and Jonathan's solution is quite suitable if the replacement string is static. If you want something more structured than the first and more flexible than the second, I'd suggest a hybrid:

sub proc {
    my $pattern = shift;
    my $code = shift;
    my $txt = "foo baz";
    $txt =~ s/$pattern/$code->()/e;
    print "$txt\n";
}

my $pattern = qr/foo/;
proc($pattern, sub { "bar" });   # ==> bar baz
proc($pattern, sub { "\U$&" });  # ==> FOO baz
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ephemient
  • 198,619
  • 38
  • 280
  • 391
4

Perhaps you might rethink your approach.

You want to pass in to a function a regex substitution, probably because the function will be deriving the text to be operated upon from some other source (reading from a file, socket, etc.). But you're conflating regular expression with regular expression substitution.

In the expression, s/foo/bar/, you actually have a regular expression ("/foo/") and a substitution ("bar") that should replace what is matched by the expression. In the approaches you've tried thus far, you ran into problems trying to use eval, mainly because of the likelihood of special characters in the expression that either interfere with eval or get interpolated (i.e., gobbled up) in the process of evaluation.

So instead, try passing your routine two arguments: the expression and the substitution:

sub apply_regex {
    my $regex = shift;
    my $subst = shift || ''; # No subst string will mean matches are "deleted"

    # Some setup and processing happens...

    # Time to make use of the regex that was passed in:
    while (defined($_ = <$some_filehandle>)) {
        s/$regex/$subst/g; # You can decide if you want to use /g etc.
    }

    # The rest of the processing...
}

This approach has an added benefit: if your regex pattern doesn't have any special characters in it, you can just pass it in directly:

apply_regex('foo', 'bar');

Or, if it does, you can use the qr// quoting-operator to create a regex object and pass that as the first parameter:

apply_regex(qr{(foo|bar)}, 'baz');
apply_regex(qr/[ab]+/, '(one or more of "a" or "b")');
apply_regex(qr|\d+|); # Delete any sequences of digits

Most of all, you really don't need eval or the use of code-references/closures for this task. That will only add complexity that may make debugging harder than it needs to be.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
rjray
  • 5,525
  • 4
  • 31
  • 37
0

I have an extremely simple script for mass file renaming that employs this trick:

#!/opt/local/bin/perl
sub oops { die "Usage : sednames s/old/new [files ..]\n"; }
oops if ($#ARGV < 0);

$regex = eval 'sub { $_ = $_[0]; ' . shift(@ARGV) . '; return $_; }';
sub regex_rename { foreach (<$_[0]>) {
    rename("$_", &$regex($_));
} }

if ($#ARGV < 0) {  regex_rename("*");  }
else {  regex_rename(@ARGV);  }

Any Perl command that modifies $_ like s/old/new could be employed to modify the files.

I decided upon using eval so that the regular expression only needed to be compiled once. There is some wonkiness with eval and $_ that prevented me from using simply:

eval 'sub { ' . shift(@ARGV) . ' }';

Although this &$regex certainly does modify $_, requiring the "$_" to evaluate $_ before calling rename. Yes, eval is quite fragile, like everyone else said.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jeff Burdges
  • 4,204
  • 23
  • 46
0

I found a probably better way to do it:

sub proc {
    my ($pattern, $replacement) = @_;
    my $txt = "foo baz";

    $txt =~ s/$pattern/$replacement/g;  # This substitution is global.
}

my $pattern = qr/foo/;  # qr means the regex is pre-compiled.
my $replacement = 'bar';

proc($pattern, $replacement);

If the flags of the substitution have to be variable, you can use this:

sub proc {
    my ($pattern, $replacement, $flags) = @_;
    my $txt = "foo baz";

    eval('$txt =~ s/$pattern/$replacement/' . $flags);
}

proc(qr/foo/, 'bar', 'g');

Please note that you don't need to escape / in the replacement string.

Aloso
  • 5,123
  • 4
  • 24
  • 41
  • That may work with the particular input (fixed strings), but what if the pattern is "`(\w+)`" and the replacement is "`\u\L$1`"? – Peter Mortensen Apr 28 '21 at 12:43
-1

You're right - you were very close:

eval('$txt =~ ' . "$pattern;");
Jason Plank
  • 2,336
  • 5
  • 31
  • 40
pevgeniev
  • 389
  • 2
  • 13