0

Here is the sample file with the duplicate lines:

abs
bsa
bsc
abs
bsa
bsb

Here is what the output should be (no duplicates):

abs
bsa
bsc
bsb

I tried out the uniq -u command, but it deletes out the duplicate lines, so would it be better to use sed or awk? Any suggestions?

Thanks!

TazerFace
  • 170
  • 12
Joey
  • 1
  • 1
    Use a programming language which offers associative arrays (a.k.a. hashes) or sets. Store each line into the set/hash, when you encounter it the first time. You output the line only if it has not been in your array before. You **can** do it in *bash* (search the man-page for associative arrays) or *awk* (where **every** array is associative), or in nearly any other language (*zsh*, *Ruby*, *Perl*, ....). A problem could be if the input is so huge that you run out of memory. – user1934428 Aug 11 '17 at 06:32

1 Answers1

1

Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'.

sort -u yourfile

The output:

abs
bsa
bsb
bsc
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105