-1

I have this file:

Took:  15.473214149475098  seconds
Took:  12.94953465461731  seconds
Took:  2.235722780227661  seconds
Took:  40.53083419799805  seconds
Took:  21.840606212615967  seconds
Took:  35.777870893478394  seconds
Took:  13.153780221939087  seconds
Took:  2.966165781021118  seconds
Took:  35.54965615272522  seconds

I would like to compute the mean and std of the times directly in the terminal. Can awk help ? I am not very familiar with it. I tried splitting the file to get the column with the numerical values only this way : cat <filename> | awk -F "Took:" {print$2} but it just returned the whole content of the file.

Inian
  • 80,270
  • 14
  • 142
  • 161
dada
  • 1,390
  • 2
  • 17
  • 40

5 Answers5

3

Could you please try following to get mean of 2nd column.

awk '{sum+=$2;if($2){count++}} END{print sum/count}'  Input_file

EDIT:

awk '{if($2!=""){count++;sum+=$2};y+=$2^2} END{sq=sqrt(y/NR-(sum/NR)^2);sq=sq?sq:0;print "Mean = "sum/count ORS "S.D = ",sq}'  Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
3

The Wikipedia page on Standard deviation has an interesting section, "Rapid calculation methods". Of particular interest is the Welford's algorithm, that is simple and numerically stable:

A_0, Q_0 = 0, 0
for k in (1, ...):
    j = k-1
    A_k = A_j + (X_k-A_j)/k
    Q_k = Q_j + (X_k-A_j)*(X_k-A_k)

where, at every step, A_k is equal to the running mean and Q_k is related to the population variance σ² by the relation Q_k = σ²*k.

With this theoretical background, we can write

$ awk 'BEGIN{a=0;q=0}{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}' file
gboffi
  • 22,939
  • 8
  • 54
  • 85
  • `BEGIN{a=0;q=0}` is not strictly necessary because in Awk numerical variables are automagically initialized to 0 but I've liked to mimic as close as possible the published algorithm. In other words, the one-liner `awk '{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}'` is equivalent to the one in the answer. – gboffi Dec 14 '18 at 10:42
  • last link is broken. – karakfa Dec 14 '18 at 14:04
  • @karakfa changed the link, thanks a lot for the heads up – gboffi Dec 14 '18 at 21:29
2

another quick way,

$ awk '{s+=$2; ss+=$2^2} END{print m=s/NR, sqrt(ss/NR-m^2)}' file

20.053 13.4924
karakfa
  • 66,216
  • 7
  • 41
  • 56
1
$ cat tst.awk
{ numbers[NR] = $2; sum += $2 }
END {
    mean = sum / length(numbers)
    # calculate std deviation
    for (i in numbers) {
        dif = numbers[i] - mean
        std += dif ^ 2
    }
    std = sqrt(std / length(numbers))

    print "Mean: " mean
    print "Standart Deviation: " std
}
$
$ awk -f tst.awk file
Mean: 20.053
Standart Deviation: 13.4924
oguz ismail
  • 1
  • 16
  • 47
  • 69
1

Using Perl one-liner

> cat dada.txt 
Took:  15.473214149475098  seconds
Took:  12.94953465461731  seconds
Took:  2.235722780227661  seconds
Took:  40.53083419799805  seconds
Took:  21.840606212615967  seconds
Took:  35.777870893478394  seconds
Took:  13.153780221939087  seconds
Took:  2.966165781021118  seconds
Took:  35.54965615272522  seconds
> perl -lane '$s+=$F[1];push(@a,$F[1]); END { $m=$s/@a; $sd+=($_-$m)**2 for(@a);$sd=sqrt($sd/@a); print "Mean:$m\nStandard Deviation:$sd"} ' dada.txt
Mean:20.0530427826775
Standard Deviation:13.4923983082523
> 
stack0114106
  • 8,534
  • 3
  • 13
  • 38