0

I have a data stream as following:

A,1
A,3
B,4
B,2
C,1
D,5

... and so on. I want to merge lines based on column 1 after adding the values in column 2. So it's supposed to look like this:

A,4
B,6
C,1
D,5

It looks like a typical MapReduce job but I want to know if there is any command/ bash tool to do this task probably in one or two lines. The file size I'm working with is hardly 3-4 KB.

gonephishing
  • 1,388
  • 3
  • 18
  • 45
  • http://unix.stackexchange.com/q/167280/67817 – Tom Fenech Apr 12 '16 at 12:57
  • @TomFenech, it's a similar type of question but doesn't have an identical solution. And so this question adds to the diversity set of these type of question. Thanks for pointing me to that question but I think you should reconsider your downvote. – gonephishing Apr 12 '16 at 21:00
  • I dunno what makes you think I voted on your question - I just linked to a solution to your problem. – Tom Fenech Apr 12 '16 at 21:06

3 Answers3

2

awk to the rescue!

there are many variations but this expects sorted input and keeps the order of the keys

awk -F, -v OFS=, '$1==p{a+=$2} $1!=p{if(p) print p,a; p=$1; a=$2} END{print p,a}' file

A,4
B,6
C,1
D,5
karakfa
  • 66,216
  • 7
  • 41
  • 56
2

Aho, Weinberger, and Kernighan are your friends here. They wrote AWK back in 1977 to deal with exactly this class of problems.
The code below will achieve your goal if your data stream is in a file called data.
cat data | awk -F"," '{ a[$1] += $2 } END { for (i in a) { printf "%s,%d\n",i,a[i]; } }'

Niall Cosgrove
  • 1,273
  • 1
  • 15
  • 24
0
cat file | sed -e "s/\(.*\),\(.*\)/\1=\`expr $\1 + \2\`/g"
#A=`expr $A + 1`
#A=`expr $A + 3`
#B=`expr $B + 4`
#B=`expr $B + 2`
#C=`expr $C + 1`
#D=`expr $D + 5`


cat  file | cut -d',' -f1 | uniq | sed 's/\(.\)/echo \1,$\1/g'
#echo A,$A
#echo B,$B
#echo C,$C
#echo D,$D

( cat file | sed -e "s/\(.*\),\(.*\)/\1=\`expr $\1 + \2\`/g" ; cat  file | cut -d',' -f1 | uniq | sed 's/\(.\)/echo \1,$\1/g') | sh -s
#A,4
#B,6
#C,1
#D,5
Ali ISSA
  • 398
  • 2
  • 10