1

I am reading whole Python files as single strings, compiling the expressions in multiline-mode. So far I have been able to match single variable assignment with Python regex:

"^\s*[A-Za-z_][A-Za-z_0-9]*\s*(?=\=)(?!==)"
  • ^\s*: First it checks if variable assignment is on a new line followed by spaces. I do this to prevent syntax such as foo(required=True, thud=3)from matching, as I do not define those as variable assignments.

  • [A-Za-z_][A-Za-z_0-9]*: Then it looks for a valid variable name...

  • \s*(?=\=)(?!==): ...and sees if variable name is followed by = and not by ==, as it is a comparison and not a variable assignment.

This works fine, but not for assignment of multiple variables in a single line:

a, b = 4, 5

In this case the regex will not match either of the variables. Note that it should not match the ,, only a and b separately. How to do this?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Snusifer
  • 485
  • 3
  • 17
  • Try `(?:(?<=\W)|^)[^\W\d]\w*\s*(?:,\s*[^\W\d]\w*\s*)*(?==(?!=))` – ctwheels Oct 15 '19 at 20:15
  • You are saying should not match commas, only variable names, but I can't see capturing groups in your regex. – Wagner Macedo Oct 15 '19 at 20:18
  • @ctwheels That was really fast haha. Problem is it matches `a, b`, and not `a` and `b` separately... – Snusifer Oct 15 '19 at 20:19
  • Ah, when I have difficulties in regex I always use this service https://regex101.com, remember to mark Python in sidebar as the regex flavor. – Wagner Macedo Oct 15 '19 at 20:20
  • @Snusifer best approach would then to split on `\s*,\s*` – ctwheels Oct 15 '19 at 20:22
  • @WagnerMacedo regex101 is what i am using. You can match them separately without groups, using re.finditer() – Snusifer Oct 15 '19 at 20:23
  • With the BOS part `^\s*` the regex is restricted to start at one place. With the `(?==)(?!==)` part the regex is forced to match a equals at the end. So not going to match anything but for example this `" _BDK90hr `=" –  Oct 15 '19 at 20:34
  • Python supports many more syntaxes: `[a]=[a,*b]=(a,[b])=a,=…`. – Davis Herring Oct 16 '19 at 02:31
  • Are extracting variable name? you should your code at least to explain what you mean by matching a b separately? – Charif DZ Oct 16 '19 at 08:11

1 Answers1

0

Python accepts in variable names any Unicode characters marked as letters and/or numbers and the underscore. The standard re module coming with Python does not support appropriate escape sequences necessary to create a regex pattern able to match a variable name, but there exists a Python regex module supporting them.

Check out my answer to the How to check if a string is a valid variable name in Python? question for an appropriate regex pattern you can then use in an attempt to construct the regex pattern for the assignment.

One of the methods validating a Python variable name I have used in my above mentioned answer is usage of the tokenize module. I recommend to check out this module for inspiration how to achieve your goal by making use of this module instead of looking for a regular expression able to duplicate Python parsing capabilities.

Claudio
  • 7,474
  • 3
  • 18
  • 48