Find min values in field 2 by looping through certain number of records using AWK

Question

I have three fields in dataset file.

field 1 acts as id

field 2 is used to compare the min

field 3 is boolean either 0 or 1.

I need to find the min value in field 2 but with respect to field 1. that is, consider below dataset.

dataset

I need to compare the values in field 2 for first 3 records and check if field 3 has value = 1 for the min value in field 2. if so, ++count.

Then find min again in field 2 but for records with field 1 = 2. that is, only record 4 and 5.and so on...

What would be the best way to go about with it? The file contains approx 2,000,000 records.

Is it possible to sort field 2 and then take one record for each different value of field1?

karakfa · Accepted Answer · 2016-06-20T18:49:26.777

1

the easiest...

$ sort -n file | awk '!a[$1]++'

1 0.12  1
2 0.056 0
3 0.982 0

to count the sum

$ sort -n file | awk '!a[$1]++{sum+=$3} END{print sum}'
1

however, if there is a match in field two and you want to pick the record with last field 1, you have to reverse sort for field 3, i.e. sort -k1,2n -k3r

Explanation

!a[$1]++ is an awk idiom to select the first unique entry for field 1. Creating a counter mapped with key, logically will be true only for the first entry (due to negation and automatic conversion of values to boolean)

sorting: first two fields in ascending order (but numerical sort so 2 < 11), third is descending order (reverse) so that 1 will appear before 0. Since last field is one digit only numerical sorting or lexical sorting doesn't matter, otherwise you want it to be numerical too.

edited Jun 20 '16 at 18:49

answered Jun 20 '16 at 15:25

karakfa

66,216
7
41
56

Can you explain the code ' !a[$1]++ ' ? I dont get this part. – Murlidhar Fichadia Jun 20 '16 at 18:19
Also, please explain sort -k1 2n -k3r. I googled and it seems -k1 is the field that we are using to sort, -k3r is the field that we are sorting in reverse order. But could you explain? too many things are happening at once. – Murlidhar Fichadia Jun 20 '16 at 18:42
The best way to learn these is to test different flags on simple files. – karakfa Jun 20 '16 at 18:45
Thank you so much, I needed this explanation to understand the working of it. – Murlidhar Fichadia Jun 20 '16 at 18:53

Find min values in field 2 by looping through certain number of records using AWK

dataset

1 Answers1