sum occurrence output of uniq -c

Question

I want to sum up occurrence output of "uniq -c" command. How can I do that on the command line?

For example if I get the following in output, I would need 250.

the sum up value will be the total number of a1 a3 a4 a5, instead of `uniq -c` then sum up, why don't you just `wc -l` to get the total number? — ray, Jan 04 '14 at 11:47

score 11 · Accepted Answer · answered Jan 04 '14 at 11:34

11

awk '{sum+=$1} END{ print sum}'

answered Jan 04 '14 at 11:34

Thorsten Staerk

1,114
9
21

score 2 · Answer 2 · answered Jan 04 '14 at 11:28

2

This should do the trick:

awk '{s+=$1} END {print s}' file

Or just pipe it into awk with

uniq -c whatever | awk '{s+=$1} END {print s}'

answered Jan 04 '14 at 11:28

Jens

69,818
15
125
179

score 0 · Answer 3 · answered Jan 04 '14 at 11:32

for each line add the value of of first column to SUM, then print out the value of SUM

awk is a better choice

uniq -c somefile | awk '{SUM+=$1}END{print SUM}'

but you can also implement the logic using bash

uniq -c somefile | while read num other
do
   let SUM+=num;
done
echo $SUM

yosefrow · Answer 4 · 2022-11-21T19:18:00.267

0

While the aforementioned answer uniq -c example-file | awk '{SUM+=$1}END{print SUM}' would theoretically work to sum the left column output of uniq -c so should wc -l somefile as mentioned in the comment.

If what you are looking for is the number of uniq lines in your file, then you can use this command:

sort -h example-file | uniq | wc -l

edited Nov 21 '22 at 19:18

answered Mar 11 '21 at 00:51

yosefrow

2,128
20
29

score 0 · Answer 5 · answered Mar 11 '21 at 21:52

uniq -c is slow compared to awk. like REALLY slow.

{mawk/mawk2/gawk} 'BEGIN { OFS = "\t" } { freqL[$1]++; } END {  # modify FS for that
                                                                # column you want
   for (x in freqL) { printf("%8s %s\n", freqL[x], x) } }'      # to uniq -c upon

if your input isn't large like 100MB+, then gawk suffices after adding in the

PROCINFO["sorted_in"] = "@ind_num_asc";  # gawk specific, just use gawk -b mode

if it's really large, it's far faster to use mawk2 then pipe to to

   { mawk/mawk2 stuff... } | gnusort -t'\t' -k 2,2

sum occurrence output of uniq -c

5 Answers5