1

I have a list of patterns I want to look for in a string. These patterns are numerous and contain numerous metacharacters that I want to just match literally. So this is the perfect application for metaquoting with \Q..\E. The complication is that I need to join the variable list of patterns into a regular expression.

use strict;
use warnings;
# sample string to represent my problem
    my $string = "{{a|!}} Abra\n{{b|!!}} {{b}} Hocus {{s|?}} Kedabra\n{{b|+?}} {{b|??}} Pocus\n {{s|?}}Alakazam\n";

# sample patterns to look for    
my @patterns = qw({{a|!}} {{s|?}} {{s|+?}} {{b|?}});
# since these patterns can be anything, I join the resulting array into a variable-length regex
my $regex = join("|",@patterns);

my @matched = $string =~ /$regex(\s\w+\s)/; # Error in matching regex due to unquoted metacharacters
print join("", @matched); # intended result: Hocus\n Pocus\n

When I attempt to introduce metaquoting into the joining operation, they appear to have no effect.

# quote all patterns so that they match literally, but make sure the alternating metacharacter works as intended
my $qmregex = "\Q".join("\E|\Q", @patterns)."\E";

my @matched = $string =~ /$qmregex(\s\w+\s)/; # The same error

For some reason the metaquoting has no effect when it is included in the string I use as the regular expression. For me, they only work when they are added directly to a regex as in /\Q$anexpression\E/ but as far as I can tell this isn't an option for me. How do I get around this?

MattLBeck
  • 5,701
  • 7
  • 40
  • 56

1 Answers1

1

I don't understand your expected result, as Abra and Kedabra are the only strings preceded by any of the patterns.

To solve your problem you must escape each component of the regex separately as \Q and \E affect only the value of the string in which they appear, so "\Q" and "\E" are just the null string "" and "\E|\Q" is just "|". You could write

my $qmregex = join '|', map "\Q$_\E", @patterns;

but it is simpler to call the quotemeta function.

You must also enclose the list in parentheses (?:...) to isolate the alternation, and apply the /g modifier to the regex match to find all ocurrences within the string.

Try

use strict;
use warnings;

my $string = "{{a|!}} Abra\n{{b|!!}} {{b}} Hocus {{s|?}} Kedabra\n{{b|+?}} {{b|??}} Pocus\n {{s|?}}Alakazam\n";

my @patterns = qw(  {{a|!}} {{s|?}} {{s|+?}} {{b|?}}  );

my $regex = join '|', map quotemeta, @patterns;
my @matched = $string =~ /(?:$regex)(\s\w+\s)/g;
print @matched;

output

 Abra
 Kedabra
Borodin
  • 126,100
  • 9
  • 70
  • 144