-1

I have some data with four columns each and I want to keep the unique lines based on only the first 3 columns and retain their respective fourth column of maximum value also.

My data looks looks like the following:

chr1    5   10  1.5
chr1    5   10  0.1
chr3    7   15  5
chr3    7   15  2
chr8    10  20  3

Could you please assist me on achieving this? I need the output to look like the following:

chr1    5   10  1.5
chr3    7   15  5
chr8    10  20  3
Sir. Hedgehog
  • 1,260
  • 3
  • 17
  • 40
Naresh DJ
  • 91
  • 1
  • 9
  • You forgot to post your code. StackOverflow is about helping people fix their code. It's not a free coding service. Any code is better than no code at all. Did you experiment with `sort -k*n* -u` or `uniq -f*n*` ? Good luck. – shellter Jun 16 '16 at 01:26
  • No code, no information on language (ksh, bash etc etc), OS and i can keep on going. Make your question complete if you want someone to take off some time in order to help you – Sir. Hedgehog Jun 16 '16 at 09:23
  • I did tried with sort and uniq but it is not giving the desired output. sort -u -t \t -k 1,1 -k 2,2 -k 3,3 and it gives simply the sorted output and not removing the duplicates based on first 3 columns. – Naresh DJ Jun 16 '16 at 17:30

1 Answers1

0

Easy enough with sort only.

sort -k1,3 -u -t' ' input.txt

-k for sorting on the basis of column 1 to 3

-u for uniq

-t for delimiter

sumitya
  • 2,631
  • 1
  • 19
  • 32
  • It is not giving the output as I shown and simply sorting the data. – Naresh DJ Jun 16 '16 at 17:31
  • @NareshDJ - Here I assume delimiter as `' '(one space)`. It works with that, check it or change delimiter according to your need. – sumitya Jun 16 '16 at 18:11
  • Yes, I tried with \t tab separator but it gives just the sorted data. Did you get the output as I shown in the post? Thank you for your help. – Naresh DJ Jun 17 '16 at 09:10