0

I am trying to identify duplicates in a unix file and sum up the values . For example:

I have a file like:

aa,  05
aa, 02
aa, 01
bb, 01
bb, 12
cc ,02
dd, 03

And I need the output:

aa, 08
bb, 13
cc, 02
dd,03
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202

1 Answers1

2

This should do the trick:

$ awk '{a[$1]+=$2}END{for(k in a)print k,a[k]}' FS=, OFS=, file
bb,13
cc ,2
dd,3
aa,8

For custom sorting pipe to sort:

$ awk '{a[$1]+=$2}END{for(k in a)print k,a[k]}' FS=, OFS=, file | sort    
aa,8
bb,13
cc ,2
dd,3

See man sort for all the things sort can do.

If you want to clean the whitespace up around the commas then one method is:

$ awk '{a[$1]+=$2}END{for(k in a)print k,a[k]}' FS=' *, *' OFS=, file | sort 
aa,8
bb,13
cc,2
dd,3
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202