0

PyCharm's (2022.1.2 Community Edition) code inspection complains that I could write some of my regular expressions as more simple equivalents. Considering unicode characters though (and why wouldn't I?), they are not equivalent. Am I mistaken, or is this simply a bug?

Example:

import re

# complains: "'[a-zA-Z0-9_]' can be simplified to '\w'"
pattern1 = re.compile("[a-zA-Z0-9_]")
pattern2 = re.compile(r"\w")
print(bool(pattern1.match("é")))  # False
print(bool(pattern2.match("é")))  # True
matheburg
  • 2,097
  • 1
  • 19
  • 46
  • In Python, these are not equivalent. However, it is true that `[A-Za-z0-9_]` = `(?a)\w` (`\w` with `re.A` / `re.ASCII` flag). – Wiktor Stribiżew Jun 13 '22 at 08:46
  • @WiktorStribiżew Thanks. Would you answer my core question whether this is a bug in Pycharm with yes then? – matheburg Jun 13 '22 at 10:36
  • This will still not have any value. The value is in understanding what `\w` matches in Python, and I have explained this quite extensively in another post. You can log a bug fix request/issue for PyCharm where appropriate. – Wiktor Stribiżew Jun 13 '22 at 10:38
  • @WiktorStribiżew I appreciate your answer, though it addresses a question that I did not raise. I was explicitly wondering whether PyCharm has a reason to warn me in that case without any indication of an ASCII flag. To me it just feels like a bug in that linter, and if I don't find a reason, I will report this behaviour at JetBrains. – matheburg Jun 13 '22 at 10:43
  • I have no idea what your settings for all that are. The point is that in Python `re`, `\w` by default is not equal to `[a-zA-Z0-9_]` **but** in Java, PHP they ARE equal. So, that may not be a bug. The linked post is just another way of saying "*the message you see it not correct when using Python `re`*", and it already is an answer for this question. – Wiktor Stribiżew Jun 13 '22 at 10:46
  • I am working with Python's `re`, and the code inspection is aware of that (see my question). If you are not aware of PyCharm's code inspection, you may just not be the right person to answer this question even though it is related to regular expressions ;) – matheburg Jun 13 '22 at 10:48
  • 1
    follow up the issue at https://youtrack.jetbrains.com/issue/PY-54668/Incorrect-warning-code-inspection-proposes-non-equivalent-simplifications-given-Unicode – matheburg Jul 20 '22 at 09:46

0 Answers0