Predefining a complex replacement for substitution

Question

I am trying to use variables in s///. This example code works as expected:

my $regex1 = "e";
my $regex2 = "2";

my @array = ("one two three", "green blue red");

$_ =~ s/$regex1/$regex2/gee foreach (@array);

print $_ foreach (@array);

However, if I try to do a more complex regex, such as:

my $regex1 = "^(\w)";
my $regex2 = "\u$1";

Then the substitution doesn't work at all. I get the feeling Perl is literally looking for "caret parenthesis backslash" and so on, and not interpreting it as a regex.

That's a very convoluted and (as you've discovered) error-prone way to implement [`ucfirst`](http://perldoc.perl.org/functions/ucfirst.html). — ThisSuitIsBlackNot, Jul 28 '16 at 21:32
This is merely a simplified example for a more complex script involving variable complex regexes generally. — Erik Olson, Jul 28 '16 at 21:54

Ruslan Batdalov · Accepted Answer · 2016-07-28T21:52:21.720

1

You need to prevent interpolation of meta-characters:

my $regex1 = '^(\w)';
my $regex2 = '"\u$1"';

(Updated according to @ThisSuitIsBlackNot's comment)

The reason is that Perl interpolates double-quoted strings, so your variables $regex1 and $regex2 do not contain what you need:

my $regex1 = "^(\w)";
my $regex2 = "\u$1";
print "$regex1\n"; # ^(w)
print "$regex2\n"; # empty line

So, the substitution operator works as s/^(w)//gee and, of course, fails to find anything.

edited Jul 28 '16 at 21:52

answered Jul 28 '16 at 21:39

Ruslan Batdalov

793
1
8
21

3

`my $regex2 = '\u$1';` isn't enough, you need `my $regex2 = q{"\u$1"};` – ThisSuitIsBlackNot Jul 28 '16 at 21:42
Say the $regex variables are being split from a string input by the user, so they type "first part/second part" and it splits on "/" to produce $regex1 and $regex2 for the s/// line. How can I prevent interpolation in this instance as well? Thanks! – Erik Olson Jul 28 '16 at 21:55
1

Enclosing into quotes works, at least, with the example you gave: `my ($regex1, $regex2) = split '/', $line; $regex2 = '"' . $regex2 . '"';` By the way, do not forget to `chomp` the input line, otherwise newline will be considered a part of the regexp too. – Ruslan Batdalov Jul 28 '16 at 22:05

Borodin · Answer 2 · 2016-07-28T22:46:25.363

You really don't want to do this, because allowing people to pass Perl code into your program that will be given to eval isn't a nice thing to do. Apart from being horribly complex it will open you to malice without careful checking. If someone typed aaa/"unlink *" then the necessary /ee will delete your current folder

Let's clear something up first. In s/PATTERN/REPLACEMENT/, only PATTERN is a regex. REPLACEMENT is a simple string evaluated as if it were in double quotes

So lets write your program like this. I've put all of your strings in single quotes as you don't want to use escape sequences or variable interpolation. I've also chnaged your /eeg modifiers to just /g. It looks like you were spraying /e around in the hope that it might work, and that's no way to write software

use strict;
use warnings 'all';

my $regex       = 'e';
my $replacement = '2';

my @array = ('one two three', 'green blue red');

s/$regex/$replacement/g for @array;

print "$_\n" for @array;

output

on2 two thr22
gr22n blu2 r2d

Now you wanted to change this to

my $regex       = "^(\w)";
my $replacement = "\u$1";

and this is why I threw out your double quotes. Perl tries to compile "^(\w)" and sees \w as an escape sequence that it doesn't recognise, so you get

Unrecognized escape \w passed through

and it assumes that you meant just w. Unless you want to escape the backslashes like "^(\\w)" you need single quotes to represent the string ^(\w)

A similar thing applies to $replacement. = "\u$1";

The first thing you'll see is Perl trying to interpolate the current value of $1 into the double-quoted string. It's currently undefined, so you get

Use of uninitialized value $1 in ucfirst

But even so it obliges and uses the empty string for $1 and then upper-cases it for you leaving ... the empty string

So now you have set

$regex       = '^(w)';
$replacement = '';

so it's unsurprising that nothing works

Let's do your program again, but this time using single quotes so that nothing gets messed with

use strict;
use warnings 'all';

my $regex       = '^(\w)';
my $replacement = '\u$1';

my @array = ('one two three', 'green blue red');

s/$regex/$replacement/g for @array;

print "$_\n" for @array;

Now $regex really is ^(\w) and $replacement really is \u$1. What can go wrong?

It works fine. We get

\u$1ne two three
\u$1reen blue red

which is exactly what we asked for

But now your /e modifier comes in useful. A single /e evaluates the REPLACEMENT as an expression. That would be useful if we wanted to stick $1 . 'xxx' or similar in there, but since the expression is $replacement we get no advantage at all: the expression $replacement is the same as interpolating $replacement

Do we need another /e? That will call eval on the result of the first /e, so we're asking for eval '\u$1', and that won't compile since \u$1 isn't a viable Perl program, so eval returns undef and we get

Use of uninitialized value in substitution iterator

The solution is to make $replacement into a compileable program. Putting double quotes around it, like "\u$1" turns it into a very short Perl program which returns the current value of $1 with the first character upper-cased

We need to set $replacement to that string, including the double quotes and avoiding the processing of escaped characters and $1 as before. If I write

my $replacement = '"\u$1"';

then I get exactly the string "\u$1" including the double-quotes

Now let's try

use strict;
use warnings 'all';

my $regex       = '^(\w)';
my $replacement = '"\u$1"';

my @array = ('one two three', 'green blue red');

s/$regex/$replacement/eeg for @array;

print "$_\n" for @array;

output

One two three
Green blue red

As I said, you really don't want to do this!

Predefining a complex replacement for substitution

2 Answers2

output

output