0

I am getting two different results for the following commands. Want to know the difference between those commands. I want to list unique lines (remove duplicates)in a file. To sort it out, I used following commands.

sort -u filename

and

sort filename|uniq -u

I got two different results. Can someone explain the difference.

Also tried this command. Got another set of new results apart from the above two.

cat filename|uniq -u.
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Hideandseek
  • 271
  • 1
  • 4
  • 17
  • please provide a minimal input file which reproduces the problem. – Karoly Horvath Oct 05 '14 at 15:16
  • Also, define the active locale -- sort orders and behavior differ based on localization settings. If you can reproduce the behavior with `export LC_ALL=C`, all the better. – Charles Duffy Oct 05 '14 at 15:18
  • @KarolyHorvath Here is the dropbox link to download the text file which I tested. https://www.dropbox.com/s/kqw19g1wxjsqhxj/abc.txt?dl=0 – Hideandseek Oct 05 '14 at 15:25
  • @all. try with wc -l command , it will show you the number of lines difference between two commands. – Hideandseek Oct 05 '14 at 15:27
  • http://www.merriam-webster.com/dictionary/minimal – Karoly Horvath Oct 05 '14 at 16:03
  • from uniq manual: Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'. – whoopdedoo Aug 16 '17 at 11:40

2 Answers2

2

file I use:

zsh/6 31167 % cat do_sortowania  
Marcin
Tomek
Marcin
Wojtek
Zosia
Zosia
Marcin
Krzysiek

using sort:

zsh/6 31168 % sort -u do_sortowania 
Krzysiek
Marcin
Tomek
Wojtek
Zosia

but using sort + uniq:

zsh/6 31170 % sort do_sortowania|uniq -u
Krzysiek
Tomek
Wojtek

Now: two answers: Short:

zsh/6 31171 % sort do_sortowania|uniq -c
      1 Krzysiek
      3 Marcin
      1 Tomek
      1 Wojtek
      2 Zosia

Long: As you can see, quniq -u return only lines that appear only one: Krzysiek, Tomek, Wojtek.
Zosia and Marcin apper 3 and 2 times so uniq -u ommit them.

P.S.

zsh/6 31172 % cat do_sortowania|uniq -u
Marcin
Tomek
Marcin
Wojtek
Marcin
Krzysiek

becouse, sort should works only on sorted files, so:

Marcin
Marcin
Tomek

will be uniqued to

Marcin
Tomek

but

Marcin
Tomek
Marcin

won't, becouse, sort compare row only to next one, becouse it's belive the file is sorted.

1

Sort with no duplicate entries in the list:

sort -u filename

| does the pipeout of the command sort to uniq, witch with -u, only print unique lines

sort filename|uniq -u

In most cases, this is the correct option for most users:

sort -u filename

for ubuntu information on sort: http://manpages.ubuntu.com/manpages/precise/en/man1/sort.1.html

c0utinh0
  • 179
  • 1
  • 8