How can I get a list of the words that have six or more consonants in a row using the grep command?

Question

I want to find a list of words that contain six or more consonants in a row from a number of text files.

I'm pretty new to the Unix terminal, but this is what I have tried:

cat *.txt | grep -Eo "\w+" | grep -i "[^AEOUIaeoui]{6}"

I use the cat command here because it will otherwise include the file names in the next pipe. I use the second pipe to get a list of all the words in the text files.

The problem is the last pipe, I want to somehow get it to grep 6 consonants in a row, it doesn't need to be the same one. I would know one way of solving the problem, but that would create a command longer that this entire post.

grep does not print the filename, if you use the `-h` switch. — user1934428, Nov 20 '20 at 13:37
It would be nice if you could provide a sample input and output so we could better understand what you are looking for — Barak Binyamin, Nov 30 '20 at 00:18

score 3 · Answer 1 · answered Nov 20 '20 at 13:00

For the last grep you also need the -E switch - or you need to escape the curly braces:

cat *.txt | grep -Eo "\w+" | grep -Ei "[^AEOUIaeoui]{6}"
cat *.txt | grep -Eo "\w+" | grep -i "[^AEOUIaeoui]\{6\}"

I use the cat command here because it will otherwise include the file names in the next pipe

You can disable this using the -h flag:

grep -hEo "\w+" *.txt | grep -Ei "[^AEOUIaeoui]{6}"

score 2 · Accepted Answer · answered Nov 20 '20 at 12:57

2

You can use

grep -hEio '[[:alpha:]]*[b-df-hj-np-tv-z]{6}[[:alpha:]]*' *.txt

Regex details

[[:alpha:]]* - any zero or more letter
[b-df-hj-np-tv-z]{6} - six English consonant letters on end
[[:alpha:]]* - any zero or more letter.

The grep options make the regex search case insensitive (i) and grep shows the matched texts only (with o) without displaying the filenames (h). The -E option allows the POSIX ERE syntax, else, if you do not specify it, you would need to escape {6} as \{6\},

answered Nov 20 '20 at 12:57

Wiktor Stribiżew

607,720
39
448
563

Why we need `[[:alpha:]]*` before and after ? – Philippe Nov 20 '20 at 17:34
@Philippe To match any zero or more letters. – Wiktor Stribiżew Nov 20 '20 at 17:48
@Philippe See the [online `grep` demo](https://ideone.com/VYpdHg) to see why `[[:alpha:]]*` is important here. – Wiktor Stribiżew Nov 20 '20 at 17:54

Timur Shtatland · Answer 3 · 2020-11-20T14:33:46.463

Use this Perl one-liner:

perl -lne 'print for grep { /[^aeoui]{6}/i } /\b([a-z]+)\b/ig' in_file.txt

Example:

cat > in_file.txt <<EOF
the abcdfghi aBcdfghi.
ABCDFGHI234
abcdEfgh
EOF

perl -lne 'print for grep { /[^aeoui]{6}/i } /\b([a-z]+)\b/ig' in_file.txt

Output:

abcdfghi
aBcdfghi

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.

The regex uses these modifiers:
/g : Multiple matches.
/i : Case-insensitive matches.

/\b([a-z]+)\b/ig : Match words that consist of 1 or more letters only ([a-z]+), with words boundary \b on both sides. This way, ABCDFGHI234 does not match, but all 3 words in line 1 (the, abcdfghi, aBcdfghi) match. This may be important for some applications. Note that not all answers in this thread use the word boundary around letters, and thus do not make the distinction shown in this example.

/[^aeoui]{6}/i : Match 6 or more consecutive non-vowels. Non-vowels here resolve exactly to consonants, because the previous regex selected for words made of letters only, that is, vowels and consonants.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

Barak Binyamin · Answer 4 · 2020-11-30T00:44:09.967

Get all words containing 6 or more consonants in a row in a given directory

cat *.txt | grep -Eo "\w+" | grep -E "[^AEOUIaeoui]{6,}"

We can use grep -Eo (-E Extended regex, -o output ONLY matching)

cat *.txt will output all of the data from all txt files in the current directory
grep -Eo "\w+" will output all of the words from an input in the form of one word per line

We can use Regex to search for strings that contain a pattern:

[^LISTOFCHARACTERS] Any character but LISTOFCHARACTERS
{6,} 6 or more

How can I get a list of the words that have six or more consonants in a row using the grep command?

4 Answers4

Get all words containing 6 or more consonants in a row in a given directory