0

I have multiple files that look like this: (format: <string>,<number>).

For example:

a,5
b,2
c,3

I want to sort and sum all of them so the final output will be

<string>,<sum of all numbers>

For example two files:

first file:

a,5
b,2
c,3

second file:

a,1
b,2

output:

a,6
b,4
c,3

firstly I would use cat * than sort. but what should I use next?

TTaJTa4
  • 810
  • 1
  • 8
  • 22

2 Answers2

1

Using awk:

$ awk 'BEGIN{FS=OFS=","}{a[$1]+=$2}END{for(i in a)print i,a[i]}' file1 file2
a,6
b,4
c,3

Output order is awk random. Use for example sort to order it if needed.

Edit: To the commant:

$ awk 'BEGIN{FS=OFS=","}{if(($1 in a)==0||a[$1]<$2)a[$1]=$2}END{for(i in a)print i,a[i]}' file1 file2
a,5
b,2
c,3
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • what if I wanted to get the larger number? i mean if a,4 and a,2 it will print a,4 (and not the sum) – TTaJTa4 Mar 29 '18 at 13:30
1

Awk is a powerful command, check some tutorials here. Some specific examples to your needs and for you to understand the concepts of awk

Assuming your file is named file.txt

a,5 b,2 c,3

You can use the following:

 awk -F, '{print $2}' file.txt

With this you are getting "the second column" delimited by the symbol ',' from the file: file.txt.

To sum a column, you can use the following:

awk -F, '{ total += $2; } END {print total}' file.txt

That is, use a temporary variable to accumulate each value in the second column.

Finally, you can use the following:

awk 'BEGIN{FS=OFS=","}{a[$1]+=$2}END{for(i in a)print i,a[i]}' file1 file2 ... fileN

You declare the delimiter by using a built-in variable of awk named OFS which stands for, Output Field Separator Variable, then you can do a loop adding the second column.

Note: sort is not needed since the sum is being done by key. And also take into account that ... fileN represents the N files you will send to the script.

Kenny Alvizuris
  • 435
  • 4
  • 6