0

I have a data looks like:

condition A
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0

then I calculated the mean value of this condition is 0.875 by using a awk command as below: (basically it's just sum all value divided by number of row) Mean: cat $a.csv | awk -F"," '$1=="Picture" && $2=="1" && $3=="hit" && $4==1{c++} END {print c/16}'

My question is how to calculate standard deviation of this condition? I already know SD of this condition is 0.3415650255 calculated by EXCEL...

And I already tried out several awk commands but still cannot get this result right...

cat $a.csv | awk -F"," '$1=="Picture" && $2=="2" && $3=="hit" && $4=="2"{c++} END {c=0;ssq=0;for (i=1;i<=16;i++){c+=$i;ssq+=$i**2}; print (ssq/16-(c/16)**2)**0.5}'

cat $a.csv | awk -F"," '$1=="Picture" && $2=="2" && $3=="hit" && $4==2{c++} {delta=$4-(c/16); avg==delta/16;mean2+=delta*($4-avg);} END { avg=c/16; printf "mean: %f. standard deviation: %f \n", avg, sqrt(mean2/16) }'

cat $a.csv | awk -F"," '$1=="Picture" && $2=="2" && $3=="hit" && $4==2{c++} END { avg=c/16; printf "mean: %f. standard deviation: %f \n", avg, sqrt((c/16-1)-(c/16-1)^2)  }'

I still cannot get the right standard deviation in this condition. Does anyone know where is the problem?

Zai-Fu Yao
  • 39
  • 6

1 Answers1

1

Recall how to calculate standard deviation. You need all the values since you need individual differences from the mean.

Doing manually first, in Excel:

enter image description here

Now you can implement that easily in any language that has arrays and math functions.

In awk:

$ echo "1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0" | tr " " "\n" > file
$ awk 'function sdev(array) {
     for (i=1; i in array; i++)
        sum+=array[i]
     cnt=i-1
     mean=sum/cnt
     for (i=1; i in array; i++)  
        sqdif+=(array[i]-mean)**2
     return (sqdif/(cnt-1))**0.5
     }
     {sum1[FNR]=$1} 
     END {print sdev(sum1)}' file
0.341565
dawg
  • 98,345
  • 23
  • 131
  • 206
  • I tried your command like this: `awk 'function sdev(array) {for (i=1; i in array; i++) { sum+=array[i]} {cnt=i-1} {mean=sum/cnt} {for (i=1; i in array; i++) {sqdif+=(array[i]-mean)**2} return (sqdif/(cnt-1))**0.5} {sum1[FNR]=$1} {print sdev(sum1)}}' t1.csv` Although it did not occur any error, but it did not print out anything as well... Can you spot where I miss? Thanks – Zai-Fu Yao Aug 01 '17 at 18:27
  • If you really have a `.csv` file you probably need to change the field separator to `awk -F"," 'the rest of the awk program...'` – dawg Aug 01 '17 at 18:31
  • Thanks. I tried, but still didn't work out...Do you think the csv file is a problem? I saw you echo newline into file, may be it's a problem as well?`cat t1.csv 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 ` – Zai-Fu Yao Aug 01 '17 at 18:43
  • Add example data to your question. – dawg Aug 01 '17 at 18:46
  • like this: `cat t1.csv 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1` – Zai-Fu Yao Aug 01 '17 at 19:36
  • That is not a csv file. It is space separated. Use the `tr` that I used to translate the spaces into `\n` or use `split` in `awk` – dawg Aug 01 '17 at 19:43