10

I need to find the regex expression to find a character that repeats 4 or more times with grep.

I know that the expression is {n,}, so if I need to find lines, for example, when the character "g" repeats 4 or more times, in theory with grep man page is:

grep "g{4,}" textsamplefile

But doesn't work. Any help?

The character could have other letters. For example, a valid match is:

gexamplegofgvalidgmatchg

gothergvalidgmatchgisghereg

ggggother

Igor Soloydenko
  • 11,067
  • 11
  • 47
  • 90
Goncatin
  • 125
  • 1
  • 1
  • 6

1 Answers1

16

you should change your grep command in:

grep -E 'g{4,}' input_file # --> this will extract only the lines containing chains of 4 or more g

if you want to take all the lines that contain chains of 4 or more identical characters your regex become:

grep -E '(.)\1{3,}' input_file

If you do not need the chains but only line where g appear 4 or more times:

grep -E '([^g]*g){4}' input_file

you can generalize to any char repeating 4 times or more by using:

grep -E '(.)(.*\1){3}' input_file
Allan
  • 12,117
  • 3
  • 27
  • 51
  • I'm not looking for chains, only the lines when the character repeats. For example, a valid match must be exgentgougligshg – Goncatin Dec 21 '17 at 08:44
  • can you try the last regex and let me know if it works? – Allan Dec 21 '17 at 08:53
  • @Goncatin Use `'([^g]*g){4}'` ERE pattern. – Wiktor Stribiżew Dec 21 '17 at 09:09
  • @Wiktor: Thanks for the tip, I have edited the answer! However, would it be possible to generalize it for any character not only `g`? I have tried to find a general regex but I couldn't get one... – Allan Dec 21 '17 at 09:12
  • 1
    edited so that it can take any char that appear 4 times or more in the line at any place – Allan Dec 21 '17 at 09:32