How to count occurrences of a word in all the files of a directory?

Question

I’m trying to count a particular word occurrence in a whole directory. Is this possible?

Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files under that directory?

I tried something like:

 zegrep "xception" `find . -name '*auth*application*' | wc -l

But it’s not working.

Carlos Campderrós · Accepted Answer · 2011-05-27T13:31:06.333

117

grep -roh aaa . | wc -w

Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wc to count how many words are there.

edited May 27 '11 at 13:31

answered May 26 '11 at 08:30

Carlos Campderrós

22,354
11
51
57

1

Also if you don't want the actual matches, only the count, you can use `grep -rcP '^aaa$' .` That saves you the piping and prevents getting embedded 'aaa' – cgledezma Jul 30 '13 at 09:56
@cgledezma good point about `-c`, but it fails if there are two or more occurrences of the searchString in one line. – Carlos Campderrós Jul 30 '13 at 10:50
2

MM... Indeed, I hadn't noticed it only counts the number of lines matched and not the actual number of matches. Still I think it may be useful to place the word boundaries to avoid nested matches. Sorry, I placed them incorrectly on the previous comment: `grep -rohP '\baaa\b . | wc -w` – cgledezma Jul 30 '13 at 11:41
@cgledezma sure, word boundaries may be useful in some situations – Carlos Campderrós Jul 30 '13 at 13:34
On osx @cgledezma's solution translates to `grep -rohe '\baaa\b . | wc -w` since `-P` is not available. – IanBussieres May 07 '15 at 16:14
One thing to also note is if you search for a pattern that has a space in between multiple words or letters e.g `grep -roh 'global \$' . ` or `grep -roh 'one two' . ` then when piping to wc -w it will count all of the words. So you may want to only count the number of exact matches not total of all the words in the result. I achieved this by piping into grep again but searching for the first word only e.g `grep -roh 'global \$' . | grep -o 'global' | wc -w `. However may be a more elegant way? – mrjamesmyers Mar 06 '18 at 16:49

Fredrik Pihl · Answer 2 · 2011-05-28T14:46:19.233

10

Another solution based on find and grep.

find . -type f -exec grep -o aaa {} \; | wc -l

Should correctly handle filenames with spaces in them.

edited May 28 '11 at 14:46

answered May 28 '11 at 14:35

Fredrik Pihl

44,604
7
83
130

perfect! I was using find based on size, this works perfectly – SeanDowney Sep 29 '11 at 16:11
@Fredrik : this executes perfectly but is there a way to the word count by avoiding multiple counts for that word in the same file? Eg : if word "aaa" appears in "file1.txt" 10 times, but count should increase only by 1 but not 10 & similarly in other files too within a directory. – annunarcist Nov 08 '13 at 21:23
@annunarcist -- yes it can be done. Post a new question and I`ll take a look :-) – Fredrik Pihl Nov 08 '13 at 21:31
@Fredrik : posted! Here is the [link](http://stackoverflow.com/questions/19869272/how-to-count-occurrences-of-a-word-in-all-the-files-of-a-directory-but-with-cou) – annunarcist Nov 08 '13 at 21:59

Parag Tyagi · Answer 3 · 2016-03-13T08:45:18.623

8

Use grep in its simplest way. Try grep --help for more info.

To get count of a word in a particular file:

grep -c <word> <file_name>

Example:

grep -c 'aaa' abc_report.csv

Output:

To get count of a word in the whole directory:

grep -c -R <word>

Example:

grep -c -R 'aaa'

Output:

abc_report.csv:445
lmn_report.csv:129
pqr_report.csv:445
my_folder/xyz_report.csv:408

edited Mar 13 '16 at 08:45

answered Mar 13 '16 at 03:22

Parag Tyagi

8,780
3
42
47

score 4 · Answer 4 · edited Feb 24 '17 at 20:45

Let's use AWK!

$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %s\n", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency

This lists the frequency of each word occurring in the provided file. If you want to see the occurrences of your word, you can just do this:

$ cat your_file.txt | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (non-recursively), you can do this:

$ cat * | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (and it's sub-directories), you can do this:

$ find . -type f | xargs cat | wordfrequency | grep yourword

Source: AWK-ward Ruby

score 1 · Answer 5 · answered May 26 '11 at 07:33

1

find .|xargs perl -p -e 's/ /\n'|xargs grep aaa|wc -l

answered May 26 '11 at 07:33

Vijay

65,327
90
227
319

jcomeau_ictx · Answer 6 · 2011-05-26T07:34:10.570

0

cat the files together and grep the output: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'

if you want 'exceptional' to match, don't use the '\<' and '\>' around the word.

edited May 26 '11 at 07:34

answered May 26 '11 at 07:27

jcomeau_ictx

37,688
6
92
107

paxdiablo · Answer 7 · 2011-05-26T07:34:37.300

How about starting with:

cat * | sed 's/ /\n/g' | grep '^aaa$' | wc -l

as in the following transcript:

pax$ cat file1
this is a file number 1

pax$ cat file2
And this file is file number 2,
a slightly larger file

pax$ cat file[12] | sed 's/ /\n/g' | grep 'file$' | wc -l
4

The sed converts spaces to newlines (you may want to include other space characters as well such as tabs, with sed 's/[ \t]/\n/g'). The grep just gets those lines that have the desired word, then the wc counts those lines for you.

Now there may be edge cases where this script doesn't work but it should be okay for the vast majority of situations.

If you wanted a whole tree (not just a single directory level), you can use somthing like:

( find . -name '*.txt' -exec cat {} ';' ) | sed 's/ /\n/g' | grep '^aaa$' | wc -l

score 0 · Answer 8 · answered May 28 '11 at 18:20

There's also a grep regex syntax for matching words only:

# based on Carlos Campderrós solution posted in this thread
man grep | less -p '\<'
grep -roh '\<aaa\>' . | wc -l

For a different word matching regex syntax see:

man re_format | less -p '\[\[:<:\]\]'

How to count occurrences of a word in all the files of a directory?

8 Answers8

Let's use AWK!

Linked

Related