1

I have a big tab-delimited text file and I want to remove all rows having same values and extracting the rows having at least one different value;

File.txt

Gen1    1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Gen2    1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Gen3    1.0 1.0 1.0 5.0 0.55    1.0 1.0 1.0 1.0
Gen4    1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Gen5    1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Gen6    0.4353  1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

Output.txt

Gen3    1.0 1.0 1.0 5.0 0.55    1.0 1.0 1.0 1.0
Gen6    0.4353  1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

Unfortunately, I could not obtain the expected output using following commands;

perl -ne 'print if ! $a{$_}++'

Or sort -u or uniq, they does not work because first column has different name.

Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • can you clarify that Gen1,2,4,5 rows are not part of output because all the values in those rows have same value? your expected output doesn't match answer you accepted... – Sundeep Oct 20 '16 at 10:16
  • It's funny how a question about "removing rows with same values" was closed as duplicate. :D – anishsane Oct 20 '16 at 11:26
  • that too when it is not a duplicate (at least not the one marked as one)... can someone reopen the question? – Sundeep Oct 20 '16 at 11:49

1 Answers1

2

sort with selected keys from second field till end (-k2):

sort -t '\t' -uk2 file.txt
  • -t '\t' sets the field delimiter as Tab

  • -u gets the unique entries based on the selected keys

Example:

% sort -uk2 file.txt
Gen6 0.4353 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Gen1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Gen3 1.0 1.0 1.0 5.0 0.55 1.0 1.0 1.0 1.0
heemayl
  • 39,294
  • 7
  • 70
  • 76