4

I want to sum up occurrence output of "uniq -c" command. How can I do that on the command line?

For example if I get the following in output, I would need 250.

 45 a4
 55 a3
  1 a1
149 a5
Jens
  • 69,818
  • 15
  • 125
  • 179
user3144923
  • 71
  • 1
  • 6
  • 3
    the sum up value will be the total number of a1 a3 a4 a5, instead of `uniq -c` then sum up, why don't you just `wc -l` to get the total number? – ray Jan 04 '14 at 11:47

5 Answers5

11
awk '{sum+=$1} END{ print sum}'
Thorsten Staerk
  • 1,114
  • 9
  • 21
2

This should do the trick:

awk '{s+=$1} END {print s}' file

Or just pipe it into awk with

uniq -c whatever | awk '{s+=$1} END {print s}'
Jens
  • 69,818
  • 15
  • 125
  • 179
0

for each line add the value of of first column to SUM, then print out the value of SUM

awk is a better choice

uniq -c somefile | awk '{SUM+=$1}END{print SUM}'

but you can also implement the logic using bash

uniq -c somefile | while read num other
do
   let SUM+=num;
done
echo $SUM
ray
  • 4,109
  • 1
  • 17
  • 12
0

While the aforementioned answer uniq -c example-file | awk '{SUM+=$1}END{print SUM}' would theoretically work to sum the left column output of uniq -c so should wc -l somefile as mentioned in the comment.

If what you are looking for is the number of uniq lines in your file, then you can use this command:

sort -h example-file | uniq | wc -l

yosefrow
  • 2,128
  • 20
  • 29
0

uniq -c is slow compared to awk. like REALLY slow.

{mawk/mawk2/gawk} 'BEGIN { OFS = "\t" } { freqL[$1]++; } END {  # modify FS for that
                                                                # column you want
   for (x in freqL) { printf("%8s %s\n", freqL[x], x) } }'      # to uniq -c upon

if your input isn't large like 100MB+, then gawk suffices after adding in the

PROCINFO["sorted_in"] = "@ind_num_asc";  # gawk specific, just use gawk -b mode

if it's really large, it's far faster to use mawk2 then pipe to to

   { mawk/mawk2 stuff... } | gnusort -t'\t' -k 2,2
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11