I'm writing a PCRE regular expression for the purpose of 'minifying' other PCRE regular expressions written in free-spacing and comments mode (/x
flag), such as:
# Match a 20th or 21st century date in yyyy-mm-dd format
(19|20)\d\d # year (group 1)
[- /.] # separator - dash, space, slash or period
(0[1-9]|1[012]) # month (group 2)
[- /.] # separator - dash, space, slash or period
(0[1-9]|[12][0-9]|3[01]) # day (group 3)
Note: I've intentionally omitted any regular expression delimiters and x
flag
The result of 'minifying' the above expression should be that all literal whitespace characters (including new lines) and comments are removed, except literal spaces within a character class (e.g. [- /.]
) and escape whitespace characters (e.g. \
):
(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])
This is the regular expression I have so far, itself written in free-spacing and comments mode (https://regex101.com/r/RHnyWw/2/):
(?<!\\)\s # Match any non-escaped whitespace character
|
(?<!\\)\#.*\s*$ # Match comments (any text following non-escaped #)
Assuming I substitute all matches with empty string, the result is:
(19|20)\d\d[-/.](0[1-9]|1[012])[-/.](0[1-9]|[12][0-9]|3[01])
This is close, except that the space characters with the separator [- /.]
parts of the pattern have lost the literal space.
How can I change this pattern so that literal space (and #
) characters with [
and ]
are preserved?