Was trying to get words with consecutive repeated letters occurring twice or thrice. Not able find a way to use quantifier and capture group using ERE
$ grep --version | head -n1
grep (GNU grep) 2.25
$ # consecutive repeated letters occurring twice
$ grep -m5 -xiE '[a-z]*([a-z])\1[a-z]*[a-z]*([a-z])\2[a-z]*' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed
$ # no output for this, why?
$ grep -m5 -xiE '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
Works with -P
though
$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed
$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){3}' /usr/share/dict/words
Chattahoochee
McConnell
Mississippi
Mississippian
Mississippians
Thanks Casimir et Hippolyte for coming up with simpler input and regex to test this behavior
$ echo 'aazbb' | grep -E '(([a-z])\2[a-z]*){2}' || echo 'No match'
aazbb
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){2}([a-z])\3[a-z]*' || echo 'No match'
aazbbycc
$ echo 'aazbbycc' | grep -P '(([a-z])\2[a-z]*){3}' || echo 'No match'
aazbbycc
$ # failing case
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){3}' || echo 'No match'
No match
Same behavior seen with sed
as well
$ sed --version | head -n1
sed (GNU sed) 4.2.2
$ echo 'aazbb' | sed -E '/(([a-z])\2[a-z]*){2}/! s/.*/No match/'
aazbb
$ echo 'aazbbycc' | sed -E '/(([a-z])\2[a-z]*){2}([a-z])\3[a-z]*/! s/.*/No match/'
aazbbycc
$ # failing case
$ echo 'aazbbycc' | sed -E '/(([a-z])\2[a-z]*){3}/! s/.*/No match/'
No match
Related search links, I checked some of them, but didn't get anything close to this question
If this is solved in newer version of grep
or sed
, let me know. Also, if the issue is seen in non-GNU implementations