0

I have a file with below records:

$ cat sample.txt
ABC,100
XYZ,50
ABC,150
QWE,100
ABC,50
XYZ,100

Expecting the output to be:

$ cat output.txt
ABC,300
XYZ,150
QWE,100

I tried the below script:

PREVVAL1=0
SUM1=0
cat sam.txt | sort >  /tmp/Pos.part
while read line
do
VAL1=$(echo $line | awk -F, '{print $1}')
VAL2=$(echo $line | awk -F, '{print $2}')
if [ $VAL1 == $PREVVAL1 ]
then
SUM1=` expr $SUM + $VAL2`
PREVVAL1=$VAL1
echo $VAL1 $SUM1
else
SUM1=$VAL2
PREVVAL1=$VAL1
fi
done < /tmp/Pos.part

I want to get some one liner command to get the required output. Wanted to avoid the while loop concept. I want to just add the numbers where the first column is same and show it in a single line.

steffen
  • 16,138
  • 4
  • 42
  • 81
Programmer
  • 329
  • 2
  • 6
  • 25
  • It would be easy using `awk`, just accumulate in an associative array. – Barmar Sep 26 '18 at 17:12
  • You don't need to use `awk` to split the line on comma. Use `while IFS=, read VAL1 VAL2` – Barmar Sep 26 '18 at 17:13
  • I need to get out of while loop and just get some one liner command for the same. Is it possible? – Programmer Sep 26 '18 at 17:14
  • BTW, consider running your code through http://shellcheck.net/ and fixing what it finds. There's no reason for `expr` in modern code, and running a pipeline starting a new copy of `awk` for every single line you process is *insanely* inefficient. – Charles Duffy Sep 26 '18 at 17:24

3 Answers3

2
awk -F, '{a[$1]+=$2} END{for (i in a) print i FS a[i]}' sample.txt

Output

QWE,100
XYZ,150
ABC,300

The first part is executed for each line and creates an associative array. The END part prints this array.

steffen
  • 16,138
  • 4
  • 42
  • 81
  • One issue I found is, if the value is large the result comes in exponential format with above command @steffen – Programmer Sep 27 '18 at 14:56
  • @Programmer Use gawk instead of awk. If that's still not enough and you have extremely big integers, use `gawk -M` or `gawk --bignum`. – steffen Sep 27 '18 at 15:11
1

It's an awk one-liner:

awk -F, -v OFS=, '{sum[$1]+=$2} END {for (key in sum) print key, sum[key]}' sample.txt > output.txt

sum[$1] += $2 creates an associative array whose keys are the first field and values are the corresponding sums.

Barmar
  • 741,623
  • 53
  • 500
  • 612
1

This can also be done easily enough in native bash. The following uses no external tools, no subshells and no pipelines, and is thus far faster (I'd place money on 100x the throughput on a typical/reasonable system) than your original code:

declare -A sums=( )
while IFS=, read -r name val; do
  sums[$name]=$(( ${sums[$name]:-0} + val ))
done

for key in "${!sums[@]}"; do
  printf '%s,%s\n' "$key" "${sums[$key]}"
done

If you want to, you can make this a one-liner:

declare -A sums=( ); while IFS=, read -r name val; do sums[$name]=$(( ${sums[$name]:-0} + val )); done; for key in "${!sums[@]}"; do printf '%s,%s\n' "$key" "${sums[$key]}"; done 
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441