Count number of occurrences of a pattern in a file (even on same line)

Question

When searching for number of occurrences of a string in a file, I generally use:

grep pattern file | wc -l

However, this only finds one occurrence per line, because of the way grep works. How can I search for the number of times a string appears in a file, regardless of whether they are on the same or different lines?

Also, what if I'm searching for a regex pattern, not a simple string? How can I count those, or, even better, print each match on a new line?

hudolejev · Accepted Answer · 2014-12-31T10:07:22.393

165

To count all occurrences, use -o. Try this:

echo afoobarfoobar | grep -o foo | wc -l

And man grep of course (:

Update

Some suggest to use just grep -co foo instead of grep -o foo | wc -l.

Don't.

This shortcut won't work in all cases. Man page says:

-c print a count of matching lines

Difference in these approaches is illustrated below:

1.

$ echo afoobarfoobar | grep -oc foo
1

As soon as the match is found in the line (a{foo}barfoobar) the searching stops. Only one line was checked and it matched, so the output is 1. Actually -o is ignored here and you could just use grep -c instead.

2.

$ echo afoobarfoobar | grep -o foo
foo
foo

$ echo afoobarfoobar | grep -o foo | wc -l
2

Two matches are found in the line (a{foo}bar{foo}bar) because we explicitly asked to find every occurrence (-o). Every occurence is printed on a separate line, and wc -l just counts the number of lines in the output.

edited Dec 31 '14 at 10:07

answered May 26 '10 at 12:03

hudolejev

5,846
4
22
28

1

Wow... is it really that simple? – jrdioko May 28 '10 at 21:58
1

grep -oc does not work in this case. Try echo afoobarfoobar | grep -oc foo – Paulus Sep 17 '14 at 08:37
Is there no way to do this for multiple files? Let's say I want to see the number of occurrences per file on a set of files. I can do it *per line* with grep -c *, but not per instance. – Keith Tyler Apr 11 '17 at 23:13
1

`grep -o foo a.txt b.txt | sort | uniq -c` works just fine (with GNU grep): https://gist.github.com/hudolejev/81a05791f38cbacfd4de3ee3b44eb4f8 – hudolejev Apr 13 '17 at 08:00

score 2 · Answer 2 · edited May 21 '18 at 10:49

2

Try this:

grep "string to search for" FileNameToSearch | cut -d ":" -f 4 | sort -n | uniq -c

Sample:

grep "SMTP connect from unknown" maillog | cut -d ":" -f 4 | sort -n | uniq -c
  6  SMTP connect from unknown [188.190.118.90]
 54  SMTP connect from unknown [62.193.131.114]
  3  SMTP connect from unknown [91.222.51.253]

edited May 21 '18 at 10:49

Sergey Vyacheslavovich Brunov

17,291
7
48
81

answered Jan 29 '15 at 14:30

IBrewThereforeIAm

239
2
3

score 2 · Answer 3 · answered Aug 15 '18 at 11:41

Ripgrep, which is a fast alternative to grep, has just introduced the --count-matches flag allowing counting each match in version 0.9 (I'm using the above example to stay consistent):

> echo afoobarfoobar | rg --count foo
1
> echo afoobarfoobar | rg --count-matches foo
2

As asked by OP, ripgrep allows for regex pattern as well (--regexp <PATTERN>). Also it can print each (line) match on a separate line:

> echo -e "line1foo\nline2afoobarfoobar" | rg foo
line1foo
line2afoobarfoobar

score 1 · Answer 4 · answered Nov 12 '12 at 22:24

1

A belated post:
Use the search regex pattern as a Record Separator (RS) in awk
This allows your regex to span \n-delimited lines (if you need it).

printf 'X \n moo X\n XX\n' | 
   awk -vRS='X[^X]*X' 'END{print (NR<2?0:NR-1)}'

answered Nov 12 '12 at 22:24

Peter.O

6,696
4
30
37

score -1 · Answer 5 · answered May 25 '10 at 22:05

-1

Hack grep's color function, and count how many color tags it prints out:

echo -e "a\nb  b b\nc\ndef\nb e brb\nr" \
| GREP_COLOR="033" grep --color=always  b \
| perl -e 'undef $/; $_=<>; s/\n//g; s/\x1b\x5b\x30\x33\x33/\n/g; print $_' \
| wc -l

answered May 25 '10 at 22:05

Shizzmo

16,231
3
23
15

Count number of occurrences of a pattern in a file (even on same line)

5 Answers5

Update

Linked