0

I have a regular expression that worked with QRegExp, but is considered invalid by QRegularExpression:

\[[a-zA-Z0-9/^-*]+\]

In most regular expression systems I've come across, the asterisk has no special meaning inside a character class, but here apparently it's still not allowed?

What's worse, the backslash loses its role as an escape character, so this is still invalid:

\[[a-zA-Z0-9/^-\*]+\]

(note: for clarity I'm ignoring \ etc here)

I can get the desired result with QRegularExpression by writing:

\[([a-zA-Z0-9/^-]|\*)+\]

But still wondering: Why can't I use an asterisk inside [] in a QRegularExpression?


#!/usr/bin/env python3
from PySide6 import QtCore

r = QtCore.QRegularExpression(r'\[[a-zA-Z0-9/^-*]+\]')
print(r.isValid())

r = QtCore.QRegularExpression(r'\[[a-zA-Z0-9/^-\*]+\]')
print(r.isValid())

r = QtCore.QRegularExpression(r'\[([a-zA-Z0-9/^-]|\*)+\]')
print(r.isValid())

produces

False
False
True

Update: @G.M. figured it out: * is fine, but ^ and - are the problem:

#!/usr/bin/env python3
from PySide6 import QtCore

good = '[A*m^-2]'
bad = '[2 + 7]'

# My original regex
r = QtCore.QRegularExpression(r'\[[a-zA-Z0-9/^-*]+\]')
print('Valid' if r.isValid() else 'Not valid')
print(r.match(good).hasMatch())
print(r.match(bad).hasMatch())
print()

# Move ^ and - to the end of the class
r = QtCore.QRegularExpression(r'\[[a-zA-Z0-9/*^-]+\]')
print('Valid' if r.isValid() else 'Not valid')
print(r.match(good).hasMatch())
print(r.match(bad).hasMatch())
print()

# Or escape them
r = QtCore.QRegularExpression(r'\[[a-zA-Z0-9/\^\-*]+\]')
print('Valid' if r.isValid() else 'Not valid')
print(r.match(good).hasMatch())
print(r.match(bad).hasMatch())

produces

Not valid
QRegularExpressionPrivate::doMatch(): called on an invalid QRegularExpression object (pattern is '\[[a-zA-Z0-9/^-*]+\]')
False
QRegularExpressionPrivate::doMatch(): called on an invalid QRegularExpression object (pattern is '\[[a-zA-Z0-9/^-*]+\]')
False

Valid
True
False

Valid
True
False
Michael Clerx
  • 2,928
  • 2
  • 33
  • 47
  • Please [edit] your question to provide a [mcve]. – G.M. Jun 08 '23 at 16:09
  • 2
    Note that `-` *does* have special meaning within `[...]`. So do you really mean to specify the character range `^-*` or should it be the three discrete characters `^*-`? – G.M. Jun 08 '23 at 16:28
  • Good spot! So does ^. It works if (1) I change the order so that - and ^ are at the end or (2) I escape ^ and - – Michael Clerx Jun 08 '23 at 16:36
  • 1
    @MichaelClerx I'm not an expert, but that regexp doesn't seem valid to me. As G.M. correctly points out, the dash has a very important meaning: if you meant the literal dash character, it should have been escaped, but if you wanted to use as a character range delimiter, the order is wrong, since the asterisk comes before the caret. – musicamante Jun 08 '23 at 16:36
  • @MichaelClerx character ranges must always be in positive order from the smallest value to the highest. Using ASCII characters (take [this list](https://www.ascii-code.com/)), the asterisk is 42, while the caret is 94. Also, that match would make the previous as redundant, since that character range *already* includes numbers and upper case letters. In any case, if you want to test your regex, consider using https://regex101.com/. – musicamante Jun 08 '23 at 16:40
  • Thanks @musicamante . I guess some implementations treat a `-` that can't be parsed as a character range as a single character "-". Because the original regex worked fine in QRegExp, but not now I ported to the newer QRegularExpression class. – Michael Clerx Jun 08 '23 at 16:43
  • @MichaelClerx In my opinion, treating the dash as literal if its parsing fails is not correct, and it's probably an "unwanted feature" (but I'd consider it a bug, as it would be an illegal regex); that's also why QRegularExpression was introduced, since it has a more reliable and robust support than the simple QRegExp had. In any case, when you want to match characters that are not letters or numbers, always consider escaping and verify the regex syntax anyway. – musicamante Jun 08 '23 at 16:50

0 Answers0