I am working in Bash and trying to write an Awk script that takes data from a CSV file, groups the data by rows and then get the min, max, and avg of the values.
Here is the complete CSV file:
Student,Catehory,Assignment,Score,Possible
Chelsey,Homework,H01,90,100
Chelsey,Homework,H02,89,100
Chelsey,Homework,H03,77,100
Chelsey,Homework,H04,80,100
Chelsey,Homework,H05,82,100
Chelsey,Homework,H06,84,100
Chelsey,Homework,H07,86,100
Chelsey,Lab,L01,91,100
Chelsey,Lab,L02,100,100
Chelsey,Lab,L03,100,100
Chelsey,Lab,L04,100,100
Chelsey,Lab,L05,96,100
Chelsey,Lab,L06,80,100
Chelsey,Lab,L07,81,100
Chelsey,Quiz,Q01,100,100
Chelsey,Quiz,Q02,100,100
Chelsey,Quiz,Q03,98,100
Chelsey,Quiz,Q04,93,100
Chelsey,Quiz,Q05,99,100
Chelsey,Quiz,Q06,88,100
Chelsey,Quiz,Q07,100,100
Chelsey,Final,FINAL,82,100
Chelsey,Survey,WS,5,5
Sam,Homework,H01,19,100
Sam,Homework,H02,82,100
Sam,Homework,H03,95,100
Sam,Homework,H04,46,100
Sam,Homework,H05,82,100
Sam,Homework,H06,97,100
Sam,Homework,H07,52,100
Sam,Lab,L01,41,100
Sam,Lab,L02,85,100
Sam,Lab,L03,99,100
Sam,Lab,L04,99,100
Sam,Lab,L05,0,100
Sam,Lab,L06,0,100
Sam,Lab,L07,0,100
Sam,Quiz,Q01,91,100
Sam,Quiz,Q02,85,100
Sam,Quiz,Q03,33,100
Sam,Quiz,Q04,64,100
Sam,Quiz,Q05,54,100
Sam,Quiz,Q06,95,100
Sam,Quiz,Q07,68,100
Sam,Final,FINAL,58,100
Sam,Survey,WS,5,5
Andrew,Homework,H01,25,100
Andrew,Homework,H02,47,100
Andrew,Homework,H03,85,100
Andrew,Homework,H04,65,100
Andrew,Homework,H05,54,100
Andrew,Homework,H06,58,100
Andrew,Homework,H07,52,100
Andrew,Lab,L01,87,100
Andrew,Lab,L02,45,100
Andrew,Lab,L03,92,100
Andrew,Lab,L04,48,100
Andrew,Lab,L05,42,100
Andrew,Lab,L06,99,100
Andrew,Lab,L07,86,100
Andrew,Quiz,Q01,25,100
Andrew,Quiz,Q02,84,100
Andrew,Quiz,Q03,59,100
Andrew,Quiz,Q04,93,100
Andrew,Quiz,Q05,85,100
Andrew,Quiz,Q06,94,100
Andrew,Quiz,Q07,58,100
Andrew,Final,FINAL,99,100
Andrew,Survey,WS,5,5
Ava,Homework,H01,55,100
Ava,Homework,H02,95,100
Ava,Homework,H03,84,100
Ava,Homework,H04,74,100
Ava,Homework,H05,95,100
Ava,Homework,H06,84,100
Ava,Homework,H07,55,100
Ava,Lab,L01,66,100
Ava,Lab,L02,77,100
Ava,Lab,L03,88,100
Ava,Lab,L04,99,100
Ava,Lab,L05,55,100
Ava,Lab,L06,66,100
Ava,Lab,L07,77,100
Ava,Quiz,Q01,88,100
Ava,Quiz,Q02,99,100
Ava,Quiz,Q03,44,100
Ava,Quiz,Q04,55,100
Ava,Quiz,Q05,66,100
Ava,Quiz,Q06,77,100
Ava,Quiz,Q07,88,100
Ava,Final,FINAL,99,100
Ava,Survey,WS,5,5
Shane,Homework,H01,50,100
Shane,Homework,H02,60,100
Shane,Homework,H03,70,100
Shane,Homework,H04,60,100
Shane,Homework,H05,70,100
Shane,Homework,H06,80,100
Shane,Homework,H07,90,100
Shane,Lab,L01,90,100
Shane,Lab,L02,0,100
Shane,Lab,L03,100,100
Shane,Lab,L04,50,100
Shane,Lab,L05,40,100
Shane,Lab,L06,60,100
Shane,Lab,L07,80,100
Shane,Quiz,Q01,70,100
Shane,Quiz,Q02,90,100
Shane,Quiz,Q03,100,100
Shane,Quiz,Q04,100,100
Shane,Quiz,Q05,80,100
Shane,Quiz,Q06,80,100
Shane,Quiz,Q07,80,100
Shane,Final,FINAL,90,100
Shane,Survey,WS,5,5
Basically, I have 5 student names and each student has completed a quiz, lab, homework for each lesson name, plus a survey and a final exam...
What I am trying to do is group this by Assignment name and generate a report that shows the lowest score achieved for that assignment, the highest score and the average score...
The output should be:
Name Low High Avg
H02 66 99 74.22
L07 47 88 66.30
and include every individual assignment name from column 3 ($3). formatted using tab (/t)
The code I have pasted already outputs the headings and the 2 decimal places in avg column but the actual values are not correct.
I have only two issues really:
I cannot for the life of me get the min or max for the individual groupings. I know how to get the min/max and even the basic syntax for it, but how do I get it to the individual groups?
Scripting this. I have very limited experience using bash, or anything Linux for that matter and am unfamiliar with awk (though I am learning quite a bit now).
So, to get myself at least started i wrote a one liner to achieve the grouping and the output formatting I am looking for, but it is only summing the scores for each group and the average is all messed up because I still have not figured out how to get the count of the scores to use as a divisor.
Anyways, this is what I have:
awk -F "," 'BEGIN{printf "Name\tLow\tHigh\tAvg\n"}
NR>=2{a[$3]+=$4; b[$3]+=$4;c[$3]+=$4/FNR }
END {for (i in b) printf "%-7s\t%d\t%d\t%.02f\n", i,a[i],b[i],c[i]}' \
score-data.csv
The output is perfect in that it is grouping by the assignment names, 2 decimals in the avg column and tabbed.... but the low and high are not correct and the average, as you can see is messed up.. tried dividing the sum by FNR. Have also tried NF and NR both... no luck. Again, I know how to get a count, but no clue how to get it in here.
So, if anyone can help me get the min/max/avg taken care of and also with the syntax for this to be a script, it would be appreciated
I cannot comment for some reason, but I have searched google and read the man awk stuff and have two different tabs open in my browser to docs on awk. None of them address it for my situation.
As far as the array naming goes, it is all the same array being used; an associative array that uses column 3 as the index/key and values from column 4 as the key's values. all of the suggested searches and links involve columns; I need rows.