16

Suppose I have a file that contain a bunch of lines, some repeating:

line1
line1
line1
line2
line3
line3
line3

What linux command(s) should I use to generate a list of unique lines:

line1
line2
line3

Does this change if the file is unsorted, i.e. repeating lines may not be in blocks?

I Z
  • 5,719
  • 19
  • 53
  • 100

4 Answers4

35

If you don't mind the output being sorted, use

sort -u

This sorts and removes duplicates

parkydr
  • 7,596
  • 3
  • 32
  • 42
11

cat to output the contents, piped to sort to sort them, piped to uniq to print out the unique values:

cat test1.txt | sort | uniq

you don't need to do the sort part if the file contents are already sorted.

go-oleg
  • 19,272
  • 3
  • 43
  • 44
7

Create a new sort file with unique lines :

sort -u file >> unique_file

Create a new file with uniques lines (unsorted) :

cat file | uniq >> unique_file
Kevin Sabbe
  • 1,412
  • 16
  • 24
1

If we do not care about the order, then the best solution is actually:

sort -u file

If we also want to ignore the case letter, we can use it (as a result all letters will be converted to uppercase):

sort -fu file

It would seem that even a better idea would be to use the command:

uniq file

and if we also want to ignore the case letter (as a result the first row of duplicates is returned, without any change in case):

uniq -i file

However, in this case, may be returned a completely different result, than in case when we use the sort command, because uniq command does not detect repeated lines unless they are adjacent.

simhumileco
  • 31,877
  • 16
  • 137
  • 115