1

I'm trying to sum up the values of column 2 where column 1 is a duplicate value, however my google search is off as I get results on how to add columns or sum whole rows not where a value matches.

Can someone confirm or link me to where I can find out how to do this? I've got to the point of organising the data but the final step eludes my search engine.

Current code

cat example.csv | sort | ##pipe of the thing that sums > output.csv

example.csv

platform1,24257022
platform2,44959636
platform_3,62
platform2,2
platform_3,20
platform1,572475
platform_3,75

desired output.csv

platform1,24829497
platform2,44959638
platform_3,157

Apologies that this is such a basic question I'm asking...

El_Birdo
  • 315
  • 4
  • 19
  • 2
    This should do it for you: `awk 'BEGIN{FS=OFS=","}{a[$1]+=$2}END{for(i in a) print i,a[i]}' file.csv`. The output will be unsorted, but I believe that should be fine. – kvantour Jan 13 '20 at 11:42
  • I marked this as a duplicate, the only difference is the field separator. – kvantour Jan 13 '20 at 12:57

2 Answers2

4

Could you please try following.

awk 'BEGIN{FS=OFS=","}{a[$1]+=$2}END{for(i in a){print i,a[i]}}' Input_file

Explanation: Adding detailed explanation of above code here.

awk '                  ##Starting awk program from here.
BEGIN{                 ##Starting BEGIN section of this awk program from here.
  FS=OFS=","           ##Setting FS and OFS as comma here.
}                      ##Closing BEGIN block of this code here.
{                      ##Starting main block for this code here.
  a[$1]+=$2            ##Creating an array named a whose index is $1 and value is $2 which keep adding to its own value.
}                      ##Closing main block of this program here.
END{                   ##Starting END block of this program here.
  for(i in a){         ##Traversing through array a all elements here.
    print i,a[i]       ##Printing index of element and value of element here.
  }                    ##Closing block for, above for loop here.
}                      ##Closing BLOCK for this program END section here.
'  Input_file          ##Mentioning Input_file name here.


2nd solution: In case you want the output in same order in which 1st field has occurred then try following since above solution will not take care of sequence.

awk '
BEGIN{
  FS=OFS=","
}
!a[$1]++{
  b[++count]=$1
}
{
  c[$1]+=$2
}
END{
  for(i=1;i<=count;i++){
    print b[i],c[b[i]]
  }
}
'  Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    I can confirm that this worked. I've got a time delay before I can accept the answer but will do. – El_Birdo Jan 13 '20 at 11:46
2

Another way, using the always-useful GNU datamash:

$ datamash -t, -s -g1 sum 2 < example.csv
platform1,24829497
platform2,44959638
platform_3,157

(Using comma as a field delimiter (-t,), sort (-s, needed for unsorted input like yours) and group by the first column (-g1) and sum the second column of each group.

Shawn
  • 47,241
  • 3
  • 26
  • 60