7

I'm trying to learn Atom's syntax highlighting/grammar rules, which heavily use JS regular expressions, and came across an unfamiliar pattern in the python grammar file.

The pattern starts with a (?x) which is an unfamiliar regex to me. I looked it up in an online regex tester, which seems to say that it's invalid. My initial thought was it represents an optional left paren, but I believe the paren should be escaped here.

Does this only have meaning in the Atom's coffeescript grammar, or am I overlooking a regex meaning?

(This pattern also appear in the textmate language file that I believe Atom's came from).

beardc
  • 20,283
  • 17
  • 76
  • 94

2 Answers2

4

If that regular expression gets processed in Python, it'll be compiled with the 'verbose' flag.

From the Python re docs:

(?aiLmsux)

(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function.

Bryant
  • 622
  • 4
  • 18
  • 1
    Thanks, I thought it was using JS regexes, but looking further it looks like Atom has modified their own regex engine, which likely includes this feature. – beardc Sep 25 '15 at 14:11
1

JavaScript regex engine does not support VERBOSE modifier x, neither inline, nor a regular one.

See Free-Spacing: x (except JavaScript) at rexegg.com:

By default, any space in a regex string specifies a character to be matched. In languages where you can write regex strings on multiple lines, the line breaks also specify literal characters to be matched. Because you cannot insert spaces to separate groups that carry different meanings (as you do between phrases and pragraphs when you write in English), a regex can become hard to read...

Luckily, many engines support a free-spacing mode that allows you to aerate your regex. For instance, you can add spaces between the tokens.

You may also see it called whitespace mode, comment mode or verbose mode.

Here is how it can look like in Python:

import re
regex = r"""(?x)
\d+                # Digits
\D+                # Non-digits up to...
$                  # The end of string
"""
print(re.search(regex, "My value: 56%").group(0)) # => 56%
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    I think the example and alternate documentation reference add to the question, so I would keep it. My big hangup was identifying the verbose mode flag, so either answer was helpful in that aspect. Thanks. – beardc Sep 25 '15 at 15:12