9

Non greedy matching as far as I know is not part of Basic Regular Expression (BRE) and Extended Regular Expression (ERE). However, the behaviour on different versions of grep (BSD and GNU) seems to suggest other wise.

For example, let's take the following example. I have a string say:

string="hello_my_dear_polo"

Using GNU grep:

Following are few attempts to extract hello from the string.

BRE Attempt (fails):

$ grep -o "hel.*\?o" <<< "$string"
hello_my_dear_polo

Output yields entire string which suggest the non-greedy quantifier does not work on BRE. Note that I have only escaped ? since * does not lose it's meaning and need not be escaped.

ERE Attempt (fails):

$ grep -oE "hel.*?o" <<< "$string"
hello_my_dear_polo

Enabling the -E option also yields the same output suggesting that non-greedy matching is not part of ERE. Escaping was not needed here since we are using ERE.

PCRE Attempt (succeeds):

$ grep -oP "hel.*?o" <<< "$string"
hello

Enabling the -P option for PCRE suggests that non-greedy quantifier is a part of it and hence we get the desired output of hello. Escaping was not needed here since we are using PCRE.

Using BSD grep:

Here are few attempts to extract hello from the string.

BRE Attempt (fails):

$ grep -o "hel.*\?o" <<< "$string"

Using BRE I get no output from BSD grep.

ERE Attempt (succeeds):

$ grep -oE "hel.*?o" <<< "$string"
hello

After enabling the -E option, I am surprised that I was able to extract my desired output. My question is on the output I am getting from this attempt.

PCRE Attempt (fails):

$ grep -oP "hel.*?o" <<< "$string"
usage: grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
    [-e pattern] [-f file] [--binary-files=value] [--color=when]
    [--context[=num]] [--directories=action] [--label] [--line-buffered]
    [--null] [pattern] [file ...]

Using -P option gave me usage error which was expected since BSD option of grep does not support PCRE.

So my question is why would using ERE on BSD grep yield correct output with using non-greedy quantifier but not with GNU grep.

Is this a bug, an un-documented feature of BSD egrep or my mis-understanding of the output?

Randall
  • 2,859
  • 1
  • 21
  • 24
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • 1
    For this case you could use `hel[^o]*o`. – Gumbo May 04 '14 at 08:50
  • 1
    Thanks @Gumbo for your comment. I agree, one would use your suggested approach for resolution, however, my question is on the odd behavior of `grep` on BSD with `-E` option turned on. – jaypal singh May 04 '14 at 15:12

1 Answers1

2

The double quantifier is simply a syntax error and could result in either an error message or undefined behavior. It would arguably be better if you got an error message.

Perl extensions to regex post-date POSIX by a large margin; at the time these tools were written, it was extremely unlikely that someone would try to use this wacky syntax for anything. Greedy matching was only introduced in Perl 5, in the mid-1990s.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thanks @tripleee. Getting an error would probably have made more sense. However, both variants of `grep` shows no error nor have anything documented in the man page. I am running `grep (BSD grep) 2.5.1-FreeBSD` version and the man page suggests it was last updated on `July 28, 2010`. [This](https://wiki.freebsd.org/BSDgrep) page seems to confirms that BSD is 100% GNU compatible. – jaypal singh May 04 '14 at 15:22
  • Yeah, but the pertinent standards have not been changed a lot since the early 2000s, at which point Perl was still the primary provider of Perl regex syntax. – tripleee May 04 '14 at 16:13