2

The regex contains a capture group, but the substitution pattern is not interpolated to reference the match variable $1 in

use strict;
use warnings;

my $regex = '([^ ]+)e s';
my $subst = '$1 ';

my $text = 'fine sand';

print $text =~ s/$regex/$subst/r;
print "\n";

The result is

$1 and

The solution to Perl regular expression variables and matched pattern substitution suggests to use the e modifier and eval in the substitution; and indeed

print $text =~ s/$regex/eval $subst/er;

would give the desired

finand

However, in my situation, the pattern and substitution strings are read from third party user input, so they cannot be considered safe for eval. Is there a way to interpolate the substitution string in a more secure way than to execute it as code? All I seek here is to expand all match variables contained in the substitution string.

The best I can currently think of involves an idiom like

$text =~ /$regex/;
sprintf $subst, $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, ...

This would require a slight change in syntax for the substitution string, but I consider this acceptable. However, the set of imaginable match variables is infinite, in particular named match variables would not be supported.

2 Answers2

3
use String::Substitution qw( sub_copy );

print sub_copy( $text, $regex, $subst );

Handles replacements '\$1.00' and '${1}00' like Perl would.

Note that while this is safe from accidental/malicious information leakage and accidental/malicious code execution, it's not safe in every sense of the word. Specifically, it's quite easy to craft a regex and string combination that will take longer than the lifespan of the universe to match.

ikegami
  • 367,544
  • 15
  • 269
  • 518
2

Here's a solution:

  • use capture groups to pick up all the groups
  • replace $\d+ in $subst with the entries of the capture groups
  • now do the substitution using the interpolated $subst.
use strict;
use warnings;

my $regex = '([^ ]+)e s';
my $subst = '$1 ';

my $text = 'fine sand';

print $text =~ s{$regex}{
    my @captured = @{^CAPTURE};
    $subst =~ s/\$([1-9]\d*)/$captured[$1-1]/rg
}er . "\n";

The re expression is safe since we only match digits and use that to index into @captured. You'll need to add bounds checking. @ikegami correctly notes it doesn't handle escaped $ either, which isn't hard to address.

Note also that using an untrusted $regex creates risk of DDOS-style attacks.

ikegami
  • 367,544
  • 15
  • 269
  • 518
craigb
  • 1,081
  • 1
  • 9
  • Good point on missing the escaping mechanism. However, when I test your first case I get `$x-$2`. Yes, `String::Substitution` is a better solution. – craigb Nov 23 '22 at 04:57
  • shoot. my bad. _ – ikegami Nov 23 '22 at 05:01
  • [Earlier comment with misinformation removed]: It probably should provide an mechanism, so we can do `my $subst = '\$1.00';` and `my $subst = '${1}00';`. And the repeated matches against the same string are needlessly wasteful. – ikegami Nov 23 '22 at 05:04
  • Fixed the performance issue. – ikegami Nov 23 '22 at 05:06