4

Consider the following regex, which checks for password strength. It has the start and end string anchors, to ensure it's matching the entire string.

pattern = re.compile(r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[$@$!%*#?&.])[A-Za-z\d$@$!%*#?&.]{8,}$')
    while True:
        user_pass = input('Enter a secure password: ')
        if re.fullmatch(pattern, user_pass):
            print('Successfully changed password')
            break
        else:
            print('Not secure enough. Ensure pass is 8 characters long with at least one upper and lowercase letter, number,'
                  ' and special character.')

I noticed Python 3.5 has a re.fullmatch() which appears to do the same thing, but without the string anchors:

pattern = re.compile(r'(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[$@$!%*#?&.])[A-Za-z\d$@$!%*#?&.]{8,}')
while True:
    user_pass = input('Enter a secure password: ')
    if re.fullmatch(pattern, user_pass):
        print('Successfully changed password')
        break
    else:
        print('Not secure enough. Ensure pass is 8 characters long with at least one upper and lowercase letter, number,'
              ' and special character.')

Is this the intended purpose of fullmatch? Are there any situations where this could cause unintended issues?

Chris
  • 15,819
  • 3
  • 24
  • 37
  • why not just use the anchors, its less extra chars 2 vs 4? that way people who are familiar with regex in general,but maybe not with some python specific stuff, will know what you mean. – Joran Beasley Jun 18 '16 at 17:45
  • 1
    @JoranBeasley That isn't helpful in the least. – Chris Jun 18 '16 at 17:46
  • you have to type more and its less clear to people who know regex ... im not sure why they even added that ... as such it does not answer your question I know ... thats why its a comment – Joran Beasley Jun 18 '16 at 17:47
  • 1
    @JoranBeasley You're not addressing the question. – Chris Jun 18 '16 at 17:47
  • The point is that sometimes you have to use an anchor explicitly, e.g. in lookahead conditions. Thus, the title is a bit ambiguous, you can't just forget about using anchors. Only do that when you know what you are doing. – Wiktor Stribiżew Jun 18 '16 at 19:21

1 Answers1

8

The fullmatch() function and regex.fullmatch() method are new in Python 3.4.

The changelog is very explicit about it:

This provides a way to be explicit about the goal of the match, which avoids a class of subtle bugs where $ characters get lost during code changes or the addition of alternatives to an existing regular expression.

So, the way you use it is indeed the intended purpose of this feature. It can not lead to unexpected issue, ^ and $ are just carefully added internally.

Delgan
  • 18,571
  • 11
  • 90
  • 141
  • 1
    Thank you sir. This is what I was looking for. – Chris Jun 18 '16 at 17:49
  • 5
    [Tim Peters observes](https://bugs.python.org/issue16203) `re.match(r'a|ab$', 'ab').group()` returns `'a'`, while `re.fullmatch(r'a|ab', 'ab').group()` returns `'ab'`. So `re.fullmatch(...)` is not simply a replacement for `re.match('...$')`. That's pretty subtle. – unutbu Jun 18 '16 at 18:14
  • 2
    I see this actually as an example in favor of fullmatch - the original re should probably have been written re.match(r'(a|ab)$', 'ab').group() but the missing parens caused the end-anchoring '$' to be associated only with the 'ab' branch of the alternation. – PaulMcG Aug 31 '18 at 15:33
  • That is enlightening, [in the re docs](https://docs.python.org/3/library/re.html#re.compile) it says "so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions." I wonder if a few is "255" or "4". Maybe it uses lru_cache which by default can handle 255 values. – run_the_race Jun 14 '22 at 09:03