3

In regex, we have greedy and lazy quantifiers. The greedy quantifier {n,m} matches the preceding atom/character/group a minimum of n and a maximum of m occurrences, inclusive.

If I have a collection of strings:

a
aa
aaa
aaaa
aaaaaaaaaa

With a{2,4}, it matches:

  • nothing on first line
  • aa on second
  • aaa on third
  • aaaa on fourth
  • (aaaa), (aaaa), and (aa) on fifth line

That makes sense.

However, if I have a lazy quantifier a{2,4}? I get:

  • nothing on first line
  • aa on second line
  • aa on third line
  • (aa) and (aa) on fourth line
  • (aa), (aa), (aa), (aa), and (aa) on fifth line

That actually makes sense. It finds the least amount of possible match.

The part that I want to clarify - is there any usefulness to pass any lazy quantifier in the form of {n,m}? a max value m (in this case, the 4 in {2,4}?)? Isn't the result is always the same as {2,}??

Is there a scenario where passing a max (like the 4 in {2,4}?) is useful in lazy quantifier?

Disclaimer: I am actually using the regular expression to search inside Vim (/a{-2,4}), not in any scripting language. I think the principle of the question is still the same.

Iggy
  • 5,129
  • 12
  • 53
  • 87

1 Answers1

0

It matters when you need to consider what follows the lazily quantified expression. Laziness is used to prevent characters from being consumed by a later expression in a concatenation. Consider the string aaaaab:

  1. The string is not matched by a{2,4}?b, as there are too many as for a{2,4} to match.
  2. The string is matched by a{2,}?b, since it can match as many as as necessary.
chepner
  • 497,756
  • 71
  • 530
  • 681
  • In case of aaaaab, a{2,4}b would match aaaab, right? However, I was expecting a{2,4}?b to match aab but when I tried it instead it matches aaaab. Interesting. – Iggy Jan 18 '22 at 00:10
  • I think the issue is that lazy or not, it's still going to start as far left as possible. Consider something like `re.match(r'a*(a{2,4}?b)', 'aaaa')`. – chepner Jan 18 '22 at 00:14