Using awk to get the maximum value of a column, for each unique value of another column

Question

So I have a file such as:

10 1 abc
10 2 def
10 3 ghi
20 4 elm
20 5 nop
20 6 qrs
30 3 tuv

I would like to get the maximum value of the second column for each value of the first column, i.e.:

10 3 ghi
20 6 qrs
30 3 tuv

How can I do using awk or similar unix commands?

Are the lines always in incrementing order of 2nd column and wit the first column always organized with common values next to each other? — Ed Morton, Jan 28 '16 at 21:49

anubhava · Accepted Answer · 2016-01-28T18:39:36.873

8

You can use awk:

awk '$2>max[$1]{max[$1]=$2; row[$1]=$0} END{for (i in row) print row[i]}' file

Output:

10 3 ghi
20 6 qrs
30 3 tuv

Explanation:

awk command uses an associative array max with key as $1 and value as $2. Every time we encounter a value already stored in this associative array max, we update our previous entry and store whole row in another associative array row with the same key. Finally in END section we simply iterate over associative array row and print it.

edited Jan 28 '16 at 18:39

answered Jan 28 '16 at 18:28

anubhava

761,203
64
569
643

2

This gives what was asked for. I would suggest to store `$0` instead of just `$3` for each maximum line. This would simplify the printing at the end and would work for a number of fields >3 (even a variable number of fields). – e0k Jan 28 '16 at 18:32
Thank you very much @anubhava for your quick answer. Can you please provide an explanation about the code? Also, Thank you e0k for the generalization, it was very useful for my needs. – leonard vertighel Jan 28 '16 at 18:36
1

Added an explanation in the answer. – anubhava Jan 28 '16 at 18:39

score 7 · Answer 2 · answered Jan 28 '16 at 18:55

7

shorter alternative with sort

$ sort -k1,1 -k2,2nr file | sort -u -k1,1

10 3 ghi
20 6 qrs
30 3 tuv

sort by field one and field two (numeric, reverse) so that max for each key will be top of the group, pick the first for each key by the second sort.

answered Jan 28 '16 at 18:55

karakfa

66,216
7
41
56

Using awk to get the maximum value of a column, for each unique value of another column

2 Answers2

Linked