-1

Using grep (GNU grep 3.3) to search for all words with three consecutive double-letters (resulting in "bookkeeper"):

grep -E "((.)\2){3}" /usr/share/dict/american-english

Changing this to search for words with three double-letters, each of them followed by the letter "i" (resulting in "Mississippi"):

grep -E "((.)\2i){3}" /usr/share/dict/american-english

Changing this to search for words with three double-letters, each of them followed by any single letter (with a couple of results):

grep -E "((.)\2.){3}" /usr/share/dict/american-english

Changing this to search for words with three double-letters separated by an optional, single letter (even more results):

grep -E "((.)\2.?){3}" /usr/share/dict/american-english

Now, finally, my original task: Search for all words containing three double-letters:

grep -E "((.)\2.*){3}" /usr/share/dict/american-english

But this results in an empty set. Why? How can .? match something .* does not?

SlowFox
  • 63
  • 5

1 Answers1

0

The POSIX regex engine does not handle patterns with back-references well, matching back references is an NP complete problem might provide some hints on why it is that difficult.

Since you are using a GNU grep, the problem is easily solved with the PCRE engine,

grep -P '((.)\2.*){3}' file

since the PCRE engine can handle back-references in a more efficient way than the POSIX regex engine.

See the online demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Hm, I'd consider "not well" to be a bug in this case. If it took (much) time: that's how it is. Returning a wrong result should not happen. – SlowFox May 27 '22 at 09:11