Top awk result cut off with count condition

Question

I have a list of student records, grades, that I want to sort by GPA, returning the top 5 results. For some reason count<=7 and below cuts off the top result. I can't figure out why that is.

Also, is there a more elegant way to remove the first column after sorting than piping the results back in to awk from sort?

user@machine:~> awk '{ if (count<=7) print $3, $0; count++; }' grades | sort -nr | awk '{ print $2 "     " $3 "     " $4 "     " $5 }'
Ahmad     Rashid     3.74     MBA
James     Davis     3.71     ECE
Sam     Chu     3.68     ECE
John     Doe     3.54     ECE
Arun     Roy     3.06     SS
James     Adam     2.77     CS
Al     Davis     2.63     CS
Rick     Marsh     2.34     CS

user@machine:~> awk '{ if (count<=8) print $3, $0; count++; }' grades | sort -nr | awk '{ print $2 "     " $3 "     " $4 "     " $5 }'
Art     Pohm     4.00     ECE
Ahmad     Rashid     3.74     MBA
James     Davis     3.71     ECE
Sam     Chu     3.68     ECE
John     Doe     3.54     ECE
Arun     Roy     3.06     SS
James     Adam     2.77     CS
Al     Davis     2.63     CS
Rick     Marsh     2.34     CS

grades:

John    Doe     3.54    ECE
James   Davis    3.71    ECE
Al      Davis    2.63    CS
Ahmad   Rashid  3.74    MBA
Sam     Chu      3.68    ECE
Arun    Roy      3.06    SS
Rick    Marsh   2.34    CS
James   Adam    2.77    CS
Art     Pohm    4.00    ECE
John    Clark    2.68    ECE
Nabeel  Ali     3.56    EE
Tom     Nelson  3.81    ECE
Pat     King    2.77    SS
Jake    Zulu    3.00    CS
John    Lee     2.64    EE
Sunil   Raj     3.36    ECE
Charles Right   3.31    EECS
Diane   Rover   3.87    ECE
Aziz    Inan    3.75    EECS
Lu      John    3.06    CS
Lee     Chow    3.74    EE
Adam    Giles   2.54    SS
Andy    John    3.98    EECS

if you add a debug statement for value of `count` you may see what your problem is. Good luck. — shellter, Feb 12 '17 at 02:55

dawg · Accepted Answer · 2017-02-12T17:33:14.397

You actually do not need awk in the case. Unix sort will sort numerically by column.

Given you input:

$ sort -k 3 -nr grades 
Art     Pohm    4.00    ECE
Andy    John    3.98    EECS
Diane   Rover   3.87    ECE
Tom     Nelson  3.81    ECE
Aziz    Inan    3.75    EECS
Lee     Chow    3.74    EE
Ahmad   Rashid  3.74    MBA
James   Davis    3.71    ECE
Sam     Chu      3.68    ECE
Nabeel  Ali     3.56    EE
John    Doe     3.54    ECE
Sunil   Raj     3.36    ECE
Charles Right   3.31    EECS
Lu      John    3.06    CS
Arun    Roy      3.06    SS
Jake    Zulu    3.00    CS
Pat     King    2.77    SS
James   Adam    2.77    CS
John    Clark    2.68    ECE
John    Lee     2.64    EE
Al      Davis    2.63    CS
Adam    Giles   2.54    SS
Rick    Marsh   2.34    CS

Then just use head:

$ count=7
$ sort -k 3 -nr grades | head -n $count
Art     Pohm    4.00    ECE
Andy    John    3.98    EECS
Diane   Rover   3.87    ECE
Tom     Nelson  3.81    ECE
Aziz    Inan    3.75    EECS
Lee     Chow    3.74    EE
Ahmad   Rashid  3.74    MBA

If you want to use gawk, you would define an array traversal based on an index. You might do something along these lines:

awk -v count=7 'function sort_by_num(i1, v1, i2, v2) {
    return (v2-v1)
}
{   lines[NR]=$0
    idx[NR]=$3
}
END {
    asorti(idx, si, "sort_by_num");
    for(n = 1; n <= count; ++n) {
            print lines[si[n]]
    }
}' grades
Art     Pohm    4.00    ECE
Andy    John    3.98    EECS
Diane   Rover   3.87    ECE
Tom     Nelson  3.81    ECE
Aziz    Inan    3.75    EECS
Ahmad   Rashid  3.74    MBA
Lee     Chow    3.74    EE

Note the difference in sort order between sort and the function we have defined in gawk for the last two. You would need to define in your function what you want with the same GPA value. The default is stable for gawk and sort is performing additional comparisons based on other columns. (You can also add the -s switch to sort and the output is identical)

Yeah I just realized using `awk` was not a requirement for that question. I'm still curious about why `awk` is behaving that way though. `count<=8` jumps from `4.00` to `3.74` as well which I didn't notice at first. — Jo P, Feb 12 '17 at 03:12

Top awk result cut off with count condition

1 Answers1