In Python3.4, I'm using the re library (the regex library gives the same result), and I'm getting a result I don't expect.
I have a string s = 'abc'. I would expect the following regex:
re.match(r"^(.*?)(b?)(.*?)$", s).groups()
..to match with three non-empty groups, namely:
('a', 'b', 'c')
--because the middle part of the pattern is greedy (b?)
. Instead, only the last group is non-empty:
('', '', 'abc')
I get the same result with both of the following:
re.match(r"^(.*?)(b?)(.*?)$", s).groups() #overt ^ and #
re.fullmatch("(.*?)(b?)(.*?)", s).groups() #fullmatch()
If I make the first group be a greedy match, then the result is:
('abc', '', '')
Which I guess I'd expect, because the greedy .*
is consuming the entire string before the other groups get to see it.
The regex I'm trying to build is, of course, more complicated than this, otherwise, I could just exclude the b
from the left and right groups:
re.match(r"^([^b]*?)(b?)([^b]*?)$", s).groups()
But in my real use case, the middle group is a string several characters long, any of which might show up on their own in the left or right groups, so I can't just exclude those chars from the left or right groups.
I've looked at other questions tagged for regex-greedy, and none seems to answer this question, although I suspect that ctwheels' reply in python non-greedy match is behind my problem (the optionality of the first two groups prevents the regex engine from actually failing until it gets to the end of the string, and then it only has to backtrack a little ways to get a non-failing match).