The Perl documentation for /m
and /s
modifiers and character classes could benefit from connecting the dots and adding a few more examples, which I will attempt here.
Regardless of /m
and /s
modifiers, a character class is allowed to match a newline. That's why [^B]*
matches \n
and gets extended through multiple newlines in your case. In fact, you can specify a character class that explicitly contains ([\n]
) or does not contain ([^\n]
) a newline. In addition to newline character (\n
), there is also a non-newline character (\N
).
The /s
modifier only alters the behavior of .
(it allows .
to match a newline). It does not alter the behavior of any other character classes.
You can get markedly different behavior using /m
and /s
modifiers alone, as shown in the examples below. This behavior is as documented, and hence predictable, but not always intuitive. I typically use these modifiers together
(/ms
), and found that it makes my code more intuitive and maintainable. This way, I do not have to think every time about the newline matching behavior. In fact, I typically use /xms
modifiers in most regexes in my own code as a matter of habit, with /x
allowing the code to be more readable and maintainable (Conway (2005), p. 236-241, Vromans (2006)).
REFERENCES:
perlrecharclass - Perl Regular Expression Character Classes: Backslash sequences
\N Match a character that isn't a newline.
perlreref - Perl Regular Expressions Reference: CHARACTER CLASSES
\N A non newline (when not followed by '{NAME}';;
not valid in a character class; equivalent to [^\n]; it's
like '.' without /s modifier)
perlre - Perl regular expressions: Modifiers
m
Treat the string being matched against as multiple lines. That is, change "^"
and "$"
from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string.
s
Treat the string as single line. That is, change "."
to match any character whatsoever, even a newline, which normally it would not match.
Used together, as /ms
, they let the "."
match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.
(Note that it says nothing about /m
or /s
altering character classes other than '.'
, so we can infer from here that they are not altered)
Using /xms
modifiers:
- Always use the /x flag.
- Always use the /m flag.
- Always use the /s flag.
(Conway (2005), p. 236-241, Vromans (2006))
Damian Conway (2005) Perl Best Practices: Standards and Styles for Developing Maintainable Code. O'Reilly Media. https://www.amazon.com/Perl-Best-Practices-Developing-Maintainable/dp/0596001738/
Perl Best Practices: Reference Guide: https://www.squirrel.nl/pub/PBP_refguide-1.02.00.pdf
EXAMPLES:
use strict;
use warnings;
use feature qw( say );
my @strings = (
"abcd\n", # single-line string
"abcd\nabcd\n", # multi-line string (first string repeated twice)
"abXd\nabcd\n", # multi-line string, same as above, but missing first 'c'
"abcd\nabXd\n", # multi-line string, same as above, but missing first 'c'
);
my @regexes = ( '^([^c]*)(c.*?)$' );
foreach my $string ( @strings ) {
foreach my $regex ( @regexes ) {
my @matches;
say "\n###";
say "# \$string='$string'; \$regex='$regex'";
@matches = map { "'$_'" } $string =~ /$regex/;
say "regex_modifiers=''; \@matches=@matches;";
@matches = map { "'$_'" } $string =~ /$regex/m;
say "regex_modifiers='m'; \@matches=@matches;";
@matches = map { "'$_'" } $string =~ /$regex/s;
say "regex_modifiers='s'; \@matches=@matches;";
@matches = map { "'$_'" } $string =~ /$regex/ms;
say "regex_modifiers='ms'; \@matches=@matches;";
}
}
Output:
###
# $string='abcd
'; $regex='^([^c]*)(c.*?)$'
regex_modifiers=''; @matches='ab' 'cd'; # ok
regex_modifiers='m'; @matches='ab' 'cd'; # /m, /s modifiers do not matter in single-line string
regex_modifiers='s'; @matches='ab' 'cd'; # /m, /s modifiers do not matter in single-line string
regex_modifiers='ms'; @matches='ab' 'cd'; # /m, /s modifiers do not matter in single-line string
###
# $string='abcd
abcd
'; $regex='^([^c]*)(c.*?)$'
regex_modifiers=''; @matches=; # '.' does not match newline, cannot reach end of string
regex_modifiers='m'; @matches='ab' 'cd'; # '$' matches first newline
regex_modifiers='s'; @matches='ab' 'cd
abcd'; # '.' matches newline, so the end of string is reached
# and '$' matches it.
regex_modifiers='ms'; @matches='ab' 'cd'; # non-greedy '.*?' causes '$' to match the first newline
###
# $string='abXd
abcd
'; $regex='^([^c]*)(c.*?)$'
regex_modifiers=''; @matches='abXd
ab' 'cd'; # [^c] matches newline, /m, /s modifiers do not matter
regex_modifiers='m'; @matches='abXd
ab' 'cd'; # [^c] matches newline, /m, /s modifiers do not matter
regex_modifiers='s'; @matches='abXd
ab' 'cd'; # [^c] matches newline, /m, /s modifiers do not matter
regex_modifiers='ms'; @matches='abXd
ab' 'cd'; # [^c] matches newline, /m, /s modifiers do not matter
###
# $string='abcd
abXd
'; $regex='^([^c]*)(c.*?)$'
regex_modifiers=''; @matches=; # '.' does not match newline, cannot reach end of string
regex_modifiers='m'; @matches='ab' 'cd'; # matches second line
regex_modifiers='s'; @matches='ab' 'cd
abXd'; # '.' matches newline, so the end of string is reached
# and '$' matches it.
regex_modifiers='ms'; @matches='ab' 'cd'; # non-greedy '.*?' causes '$' to match the first newline