Scripts for computing the average of a list of numbers in a data file

Question

The file data.txt contains the following:

1.00 1.23 54.4 213.2 3.4

The output of the scripts are supposed to be:

ave: 54.646

Some simple scripts are preferred.

awk '{s+=$1}END{print "ave:",s/NR}' RS="\n" file #if you have one record per line — Alexx Roche, Oct 10 '13 at 10:16
the default record separator(RS) is \n so you don't need to specify RS="\n" if you have multiple lines. `awk '{s+=$1}END{print "ave:",s/NR}' file` — Manoel Stilpen, Jul 05 '17 at 03:13

score 93 · Accepted Answer · answered Mar 28 '13 at 12:14

93

Here is one method:

$ awk '{s+=$1}END{print "ave:",s/NR}' RS=" " file
ave: 54.646

answered Mar 28 '13 at 12:14

Chris Seymour

83,387
30
160
202

5

That will produce a division by zero error on an empty file. You need something like `awk '{s+=$1}END{print "ave:",(NR?s/NR:"NaN")}' RS=" " file` – Ed Morton Jun 07 '16 at 12:11
1

As someone new to awk, I run this on this list: 8.20 7.10 8.10 7.40 6.50 8.40 7.90 8.50 9.30 8.80 9.80 9.50 8.20 9.20 8.30 9.10 and get an average of 8.2. If I sum the values, awk '{s+=$1}END{print "sum:" s}' I get 134.3. Divide this by the number of values and you get 8.39... which matches calculating it by spreadsheet. Does awk reduce the number of significant figures? – Montag Mar 16 '23 at 02:18

nisetama · Answer 2 · 2022-09-08T12:10:43.823

50

Another option is to use jq:

$ seq 100|jq -s add/length
50.5

-s (--slurp) creates an array for the input lines after parsing each line as JSON, or as a number in this case.

Edit: awk is faster and it doesn't require reading the whole input to memory:

$ time seq 1e6|awk '{x+=$0}END{print x/NR}'>/dev/null

real  0m0.145s
user  0m0.148s
sys   0m0.008s
$ time seq 1e6|jq -s add/length>/dev/null

real  0m0.685s
user  0m0.669s
sys   0m0.024s

edited Sep 08 '22 at 12:10

answered Dec 16 '15 at 18:14

nisetama

7,764
1
34
21

6

probably one of the most creative ways to use `jq` i've seen! – morganbaz Apr 14 '19 at 16:50
this has the advantage of being able to be used after having used jq to filter out the necessary numbers, like: ` jq -r '.SpotPriceHistory[].SpotPrice' | jq -s add/length` – dimisjim Oct 19 '20 at 15:15
FYI if you're averaging a lot of numbers. This is definitely way slower than the awk method. I can't be bothered to benchmark it, but for 14 million integers, `awk` returns nearly instantly while the `jq` method takes awhile... – WattsInABox Oct 18 '21 at 14:51
Yeah although this is super neat and although it's not *that* slow, this is indeed pretty slow, only suitable for processing very small datasets. – Steven Lu Nov 02 '21 at 22:59
@nisetama I had a billion lines or so to work with, other solutions finished, jq's didn't because I gave up on it :) – WattsInABox Nov 08 '21 at 16:49
https://stedolan.github.io/jq/manual/#Advancedfeatures – Chris Smith Oct 13 '22 at 21:40

Vijay · Answer 3 · 2013-03-28T13:25:30.737

6

perl -lane '$a+=$_ for(@F);print "ave: ".$a/scalar(@F)' file

if you have multiple lines and you just need a single average:

perl -lane '$a+=$_ for(@F);$f+=scalar(@F);END{print "ave: ".$a/$f}' file

edited Mar 28 '13 at 13:25

answered Mar 28 '13 at 13:18

Vijay

65,327
90
227
319

Udesh · Answer 4 · 2023-06-18T06:49:40.100

I had to find the average (time took for each ping request)

ping -c 5 www.youtube.com | grep time= | awk -F '[= ]' '{ sum += $11 } END { printf("%.2f ms\n", sum/NR) }'

-c in ping request is for numbers of icmp requests to send.
-F in grep used to specify the field separators.
We can use printf to control number of precision we want to display after the decimal points in our case it is 2 decimal points(%.2f).
“NR” is a special built-in variable of AWK that stands for "number of records". This variable is used to deal with the number of records present in the specified files.
We are calculating average by diving sum with NR

You can also find the sum (sum of all time took by the ping request)

ping -c 5 www.youtube.com | grep time= | awk -F '[= ]' '{ sum += $11 } END { printf("%.2f ms\n", sum) }'

Scripts for computing the average of a list of numbers in a data file

4 Answers4

Linked

Related