17

From the regex docs it says that:

Pattern.match(...)

If zero or more characters at the beginning of string match this regular expression

Pattern.fullmatch(...)

If the whole string matches this regular expression

Pattern.search(...)

Scan through string looking for the first location where this regular expression produces a match

Given the above, why couldn't someone just always use search to do everything? For example:

re.search(r'...'   # search
re.search(r'^...'  or re.search(r'\A...'   # match
re.search(r'^...$' or re.search(r'\A...\Z' # fullmatch

Are match and fullmatch just shortcuts (if they could be called that) for the search method? Or do they have other uses that I'm overlooking?

samuelbrody1249
  • 4,379
  • 1
  • 15
  • 58

2 Answers2

24

Giving credit for @Ruzihm's answer since parts of my answer derive from his.


Quick overview

A quick rundown of the differences:

  • re.match is anchored at the start ^pattern
    • Ensures the string begins with the pattern
  • re.fullmatch is anchored at the start and end of the pattern ^pattern$
    • Ensures the full string matches the pattern (can be especially useful with alternations as described here)
  • re.search is not anchored pattern
    • Ensures the string contains the pattern

A more in-depth comparison of re.match vs re.search can be found here


With examples:

aa            # string
a|aa          # regex

re.match:     a
re.search:    a
re.fullmatch: aa

 

ab            # string
^a            # regex

re.match:     a
re.search:    a
re.fullmatch: # None (no match)

So what about \A and \Z anchors?

The documentation states the following:

Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default).

And in the Pattern.fullmatch section it says:

If the whole string matches this regular expression, return a corresponding match object.

And, as initially found and quoted by Ruzihm in his answer:

Note however that in MULTILINE mode match() only matches at the beginning of the string, whereas using search() with a regular expression beginning with ^ will match at the beginning of each line.

>>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
<re.Match object; span=(4, 5), match='X'>
\A^A
B
X$\Z

# re.match('X', s)                  no match
# re.search('^X', s)                no match

# ------------------------------------------
# and the string above when re.MULTILINE is enabled effectively becomes

\A^A$
^B$
^C$\Z

# re.match('X', s, re.MULTILINE)    no match
# re.search('^X', s, re.MULTILINE)  match X

With regards to \A and \Z, neither performs differently for re.MULTILINE since \A and \Z are effectively the only ^ and $ in the whole string.

So using \A and \Z with any of the three methods yields the same results.


Answer (line anchors vs string anchors)

What this tells me is that re.match and re.fullmatch don't match line anchors ^ and $ respectively, but that they instead match string anchors \A and \Z respectively.

ctwheels
  • 21,901
  • 9
  • 42
  • 77
4

Yes, they can be seen as shortcuts of re.search calls that start with \A or start with \A and end with \Z.

Because \A always specifies the beginning of the string, using re.search and prepending \A seems to equate re.match, even under MULTILINE mode. Some examples:

import re
haystack = "A\nB\nZ"

matchstring = 'A'
x=re.match(matchstring, haystack) # Match
y=re.search('\A' + matchstring, haystack) # Match

matchstring = 'A$\nB'
x=re.match(matchstring, haystack, re.MULTILINE) # Match
y=re.search('\A' + matchstring, haystack, re.MULTILINE) # Match

matchstring = 'A\n$B'
x=re.match(matchstring, haystack, re.MULTILINE) # No match
y=re.search('\A' + matchstring, haystack, re.MULTILINE) # No match

The same is true for putting the search string between \A and \Z to equate fullmatch.


Not including \A / \Z:

No, they treat MULTILINE differently. From the documentation:

Note however that in MULTILINE mode match() only matches at the beginning of the string, whereas using search() with a regular expression beginning with '^' will match at the beginning of each line.

...

>>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
<re.Match object; span=(4, 5), match='X'>

Likewise, in MULTILINE mode, fullmatch() matches at the beginning and end of the string, and search() with '^...$' matches at the beginning and end of each line.


Ruzihm
  • 19,749
  • 5
  • 36
  • 48
  • 1
    @Ruzihm, what about using \A and \Z then instead of ^ and $? Would that make them always-equivalent? – samuelbrody1249 Nov 08 '19 at 21:43
  • 1
    @samuelbrody1249 excellent point. I edited my answer to address that. I couldn't find a difference, even when profiling their runtime (not shown in answer) – Ruzihm Nov 08 '19 at 22:18