I am trying to identify duplicates in a unix file and sum up the values . For example:
I have a file like:
aa, 05
aa, 02
aa, 01
bb, 01
bb, 12
cc ,02
dd, 03
And I need the output:
aa, 08
bb, 13
cc, 02
dd,03
I am trying to identify duplicates in a unix file and sum up the values . For example:
I have a file like:
aa, 05
aa, 02
aa, 01
bb, 01
bb, 12
cc ,02
dd, 03
And I need the output:
aa, 08
bb, 13
cc, 02
dd,03
This should do the trick:
$ awk '{a[$1]+=$2}END{for(k in a)print k,a[k]}' FS=, OFS=, file
bb,13
cc ,2
dd,3
aa,8
For custom sorting pipe to sort:
$ awk '{a[$1]+=$2}END{for(k in a)print k,a[k]}' FS=, OFS=, file | sort
aa,8
bb,13
cc ,2
dd,3
See man sort
for all the things sort
can do.
If you want to clean the whitespace up around the commas then one method is:
$ awk '{a[$1]+=$2}END{for(k in a)print k,a[k]}' FS=' *, *' OFS=, file | sort
aa,8
bb,13
cc,2
dd,3