How to identify duplicates in a unix file and sum up the values

Question

I am trying to identify duplicates in a unix file and sum up the values . For example:

I have a file like:

aa,  05
aa, 02
aa, 01
bb, 01
bb, 12
cc ,02
dd, 03

And I need the output:

aa, 08
bb, 13
cc, 02
dd,03

Chris Seymour · Accepted Answer · 2014-06-29T19:47:28.040

This should do the trick:

$ awk '{a[$1]+=$2}END{for(k in a)print k,a[k]}' FS=, OFS=, file
bb,13
cc ,2
dd,3
aa,8

For custom sorting pipe to sort:

$ awk '{a[$1]+=$2}END{for(k in a)print k,a[k]}' FS=, OFS=, file | sort    
aa,8
bb,13
cc ,2
dd,3

See man sort for all the things sort can do.

If you want to clean the whitespace up around the commas then one method is:

$ awk '{a[$1]+=$2}END{for(k in a)print k,a[k]}' FS=' *, *' OFS=, file | sort 
aa,8
bb,13
cc,2
dd,3

1 Answers1