Sort and keep a unique duplicate which has the highest value

Question

I have a file like the one shown below, I want to keep the combinations between the first and second field which has the highest value on the third field(the ones with the arrows, arrows are not included in the actual file) .

1   1   10
1   1   12        <- 
1   2   6         <-
1   3   4         <- 
2   4   32
2   4   37
2   4   39
2   4   40        <- 
2   45  12
2   45  15        <- 
3   3   12
3   3   15
3   3   17
3   3   19        <- 
3   15  4
3   15  9         <- 
4   17  25
4   17  28
4   17  32
4   17  36        <- 
4   18  4         <-

in order to have and output like this:

And I thought maybe I just play with the sort and uniq command, but I made a mess.

Any ideas?

Very important note: the entries are not neatly sorted from the beginning, I just used sort -k1,1 -k2,2 -k3,3

Thanks in advance guys

`sort -r -k1 -k2 -k3 x.txt | awk '{if($i!=l1 && $2!=l2)print $0; l1=$1; l2=$2;}'` — Jerry Jeremiah, Apr 02 '14 at 20:42
Pretty clever Jerry, and I added some stuff because of the sort, I had to specify thet the sort is by number values and it ended like this: sort -r -nk1,1 -nk2,2 -nk3,3 | awk '{if($i!=l1 && $2!=l2)print $0; l1=$1; l2=$2;}' please post your solution in order to give you a green check. — Tamalero, Apr 02 '14 at 20:50
Give the check to one of the other answers. The reason I didn't post it as an answer is because it relied on code and it was clear from your comment about sort and uniq that a real answer would be done with just standard commands. The reason I posted it as a comment was just in case no one else supplied an answer so that you would have something that worked. — Jerry Jeremiah, Apr 02 '14 at 20:58
could you explain how awk '{if($i!=l1 && $2!=l2)print $0; l1=$1; l2=$2;}' works? — Fırat Uyulur, Jul 27 '20 at 11:16

score 4 · Accepted Answer · answered Apr 02 '14 at 20:46

4

This is a bit funny, but:

sort -nr myfile.txt | rev | uniq -f1 | rev | sort -n

Output:

How it works:

Sort reverse numerically, putting the highest values at the top (so they are saved)
Reverse each line, so the last field is first (needed for uniq)
Save only the first uniq line, but ignoring the first field (was the last field)
Reverse the line back to original order
Sort the lines from low to high again

Probably not the most efficient in the world, but at least each step makes some sense.

answered Apr 02 '14 at 20:46

beroe

11,784
5
34
79

I like this answer best! IT does exactly what he wanted: uses sort and uniq – Jerry Jeremiah Apr 02 '14 at 20:54
Thanks JJ. I liked your approach in the comment, but couldn't actually get it to work. – beroe Apr 02 '14 at 20:56
just curious: why did you have problems with my idea? (it worked for me when I tried it) – Jerry Jeremiah Apr 03 '14 at 03:07
1

Unfortunately, `uniq -f` is a nonportable extension. – tripleee Feb 11 '16 at 09:59

iruvar · Answer 2 · 2014-04-03T14:21:35.447

2

Two passes of sort should do it, for example in bash shell

sort -k1,1n -k2,2n -k3,3nr -t$'\t'  file  | sort -k1,1n -k2,2n -t$'\t' -u -s
1       1       12
1       2       6
1       3       4
2       4       40
2       45      15
3       3       19
3       15      9
4       17      36
4       18      4

edited Apr 03 '14 at 14:21

answered Apr 02 '14 at 20:52

iruvar

22,736
7
53
82

@beroe, perhaps you have space delimiters and not tab? The command above needs tab-delimited input. As you can see from the output I have pasted in, it matches what the OP is looking for. I am using GNU sort on linux – iruvar Apr 02 '14 at 21:02

score 0 · Answer 3 · edited May 06 '21 at 08:38

sort -nr myfile.txt | rev | uniq -f1 -f2 -f3 -f4 | rev | sort -n

The above worked very well for a file I needed to sort 4 columns displaying the highest values for field 2 and 4

STORE402        27         8          1          21-04-2021_07:55:01
STORE402        34         8          3          19-04-2021_11:40:01
STORE402        34         8          3          19-04-2021_15:05:01
STORE402        36         8          4          21-04-2021_12:05:01
STORE402        40         8          5          19-04-2021_12:20:01
STORE402        43         8          6          20-04-2021_10:40:01

output after running the command:

STORE402        43         8          6          20-04-2021_10:40:01

Sort and keep a unique duplicate which has the highest value

3 Answers3

Linked

Related