-2

I am not able to understand the following code behavior.

>>> import re
>>> text = 'been'
>>> r = re.compile(r'b(e)*')
>>> r.search(text).group()
'bee' #makes sense
>>> r.findall(text)
['e'] #makes no sense

I read some already existing question and answers about capturing groups and all. But still I am confused. Could someone please explain me.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
  • Have you read the docs for [`re.search`](https://docs.python.org/2/library/re.html#re.search) vs. [`re.findall`](https://docs.python.org/2/library/re.html#re.findall)? They do different things... – MattDMo May 30 '15 at 23:26
  • 1
    If you use one of them incorrectly a cat dies somewhere in the world. – Padraic Cunningham May 31 '15 at 01:07

2 Answers2

1

When a pattern contains a capture group, findall returns only the content of the capture group and no more the whole match.

If this behaviour looks strange, it can be very useful to extract easily parts of a string in a particular context (substring before or after), especially since python re module doesn't support variable length lookbehinds.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
1

The answer is simplified in the Regex Howto

As you can read here, group returns the string matched by the Regular Expression.

group() returns the substring that was matched by the RE.

But the action of findall is justified in the documentation

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group

So you are getting the matched part of the capture group.

Some experiments include :

>>> r = re.compile(r'(b)(e)*')
>>> r.findall(text)
[('b', 'e')]

Here the regex has two capturing groups, so the returned values are a list of matched groups (in tuples)

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140