-3

i have a very large tab separated file, a part of which looks like this:

33  x   171 297 126
4   x   171 300 129
2   x   171 303 132
11  y   163 289 126
5   y   163 290 127
3   y   163 291 128
2   y   163 292 129
2   y   170 289 119
2   z   166 307 141
2   z   166 308 142
6   z   166 309 143
4   z   166 329 163
2   z   166 330 164

i want to sort and select only one line for each: x,y, z based on the highest value associated with it in the first column (in unix)

Hunter McMillen
  • 59,865
  • 24
  • 119
  • 170
kaur
  • 9
  • 1

1 Answers1

1

You can do this with awk:

awk '
{
  key = $2;
  flag = 0;
  if (key in value) { max = value[key] ; flag = 1 };
  if (flag == 0 || max < $1) { value[key] = $1; line[key] = $0 };
}
END {
  for (key in line) { print line[key] };
}
' data.tsv
Andrey
  • 2,503
  • 3
  • 30
  • 39
  • You don't need the flag and max. Remove line 2 and 3 in the first block, and change the if in line 4 to `if ( value[key] < $1)`. – ULick May 02 '17 at 21:49
  • @ULick if the first column contains negative numbers, your version might not work properly (default values are 0 and ""). – Andrey May 02 '17 at 23:10
  • True. Can be solved by `if ( ! (key in value) || $1 > value[key] )`. Still no flag, but loosing readability. – ULick May 03 '17 at 17:06
  • this works perfect \nperl -lanE '($v,$k)=@F[0..1];$h{$k}=$_,$j{$k}=$v if $j{$k}<$v;END{say for values %h}' file – kaur May 03 '17 at 19:52