26

I have a file that looks like this:

2011-03-21 name001 line1
2011-03-21 name002 line2
2011-03-21 name003 line3
2011-03-22 name002 line4
2011-03-22 name001 line5

for each name, I only want its last appearance. So, I expect the result to be:

2011-03-21 name003 line3
2011-03-22 name002 line4
2011-03-22 name001 line5

Could someone give me a solution with bash/awk/sed?

Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
Dagang
  • 24,586
  • 26
  • 88
  • 133

4 Answers4

39

This code get uniq lines by second field but from the end of file or text (like in your result example)

tac temp.txt | sort -k2,2 -r -u
PaulP
  • 1,925
  • 2
  • 20
  • 25
  • 1
    Make sure that the last line of your input file contains a \n otherwise tac will concatenate it with the last but one line – Rishi Dua Jul 08 '14 at 17:38
  • To specify another separator, use -t: `tac temp.txt | sort -k1,1 -r -u -t@` – Simon Lang Apr 18 '17 at 20:56
  • Would you mind explaining the sort parameters `-k2,2`? :) – myradio Nov 03 '19 at 13:03
  • @myradio There is good description in wiki [here](https://en.wikipedia.org/wiki/Sort_(Unix)#Columns_or_fields) and [here](https://en.wikipedia.org/wiki/Sort_(Unix)#Sort_on_multiple_fields) – PaulP Nov 25 '19 at 06:17
11
awk '{a[$2]=$0} END {for (i in a) print a[i]}' file

If order of appearance is important:

  • Based on first appearance:

    awk '!a[$2] {b[++i]=$2} {a[$2]=$0} END {for (i in b) print a[b[i]]}' file
    
  • Based on last appearance:

    tac file | awk '!a[$2] {b[++i]=$2} {a[$2]=$0} END {for (i in b) print a[b[i]]}'
    
pepoluan
  • 6,132
  • 4
  • 46
  • 76
  • This is good - simple and robust. The order of the output does not match the order of the output if that is important though. Is there an easy way to fix that? – Paul Mar 25 '11 at 08:11
  • @Paul yes, but this will result in a much more complex awk program. I'll edit my answer. – pepoluan Mar 25 '11 at 08:12
  • Actually, I was meaning just reversing the printing of the array rather than which entry was selected. So that the output would be in time order: line 3, line 4, line 5 rather than line 5, line 4, line 3. +1 from me for the first simple answer. Oh wait, yeah - I see that is what you were doing - it does get stupidly complex. – Paul Mar 25 '11 at 08:24
  • @Paul oh, I misunderstood :) ... well, you can always pipe its output to `sort`. would be much simpler than trying to cram everything in `awk`. – pepoluan Mar 25 '11 at 08:26
  • I used the simplest one, and add sort on time stamp field after that. Really a good solution, thanks! – Dagang Mar 25 '11 at 10:19
6
sort < bar > foo
uniq  < foo > bar

bar now has no duplicated lines

nkvnkv
  • 914
  • 2
  • 12
  • 25
  • 1
    Given the OP's example, all the lines would be counted as unique. He only wants the second field to be used to determine uniqueness. – gdw2 Mar 01 '12 at 15:13
  • 1
    +1 ...but this answers the title ('bash eliminate duplicate lines' at the moment), which is what Google seemed to use to send me here! – sage Dec 27 '13 at 23:26
3

EDIT: Here's a version that actually answers the question.

sort -k 2 filename | while read f1 f2 f3; do if [ ! "$f2" = "$lf2" ]; then echo "$f1 $f2 $f3"; lf2="$f2"; fi; done
Erik
  • 88,732
  • 13
  • 198
  • 189