finding duplicates in a field and printing them in unix bash

Question

I have a file the contains

apple
apple
banana
orange
apple
orange

I want a script that finds the duplicates apple and orange and tells the user that the following : apple and orange are repeated. I tried

nawk '!x[$1]++' FS="," filename

to find repeated item so how can i print them out in unix bash ?

devnull · Answer 1 · 2013-07-29T06:59:58.293

11

In order to print the duplicate lines, you can say:

$ sort filename | uniq -d
apple
orange

If you want to print the count as well, supply the -c option to uniq:

$ sort filename | uniq -dc
      3 apple
      2 orange

edited Jul 29 '13 at 06:59

answered Jul 29 '13 at 06:52

devnull

118,548
33
236
227

Note that according to uniq, word "ウェイター" is duplicate of "ウエイター" (ェ=エ) – asdjfiasd Oct 21 '15 at 09:54

score 4 · Answer 2 · edited May 23 '17 at 12:13

4

+1 for devnul's answer. However, if the file contains spaces instead of newlines as delimiter. then the following would work.

tr [:blank:] "\n" < filename | sort | uniq -d

edited May 23 '17 at 12:13

Community

1
1

answered Jul 29 '13 at 07:00

Varun

691
4
9

hek2mgl · Answer 3 · 2013-07-29T07:12:02.530

1

Update:

The question has been changed significantly. Formerly, when answering this, the input file should look like:

apple apple banana orange apple orange
banana orange apple
...

However, the solution will work anyway, but might be a little bit too complicated for this special use case.

The following awk script will do the job:

awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file

Output:

apple 3
orange 2

It is more understandable in a form like this:

#!/usr/bin/awk

{
  i=1;
  # iterate through every field
  while(i <= NF) {
    a[$(i++)]++; # count occurrences of every field
  }
}

# after all input lines have been read ...
END {
  for(i in a) {
    # ... print those fields which occurred more than 1 time
    if(a[i] > 1) {
      print i,a[i];
    }
  }
}

Then make the file executable and execute it passing the input file name to it:

chmod +x script.awk
./script.awk your.file

edited Jul 29 '13 at 07:12

answered Jul 29 '13 at 06:50

hek2mgl

152,036
28
249
266

+1. On attempting to format the question, it became evident that the input file had items placed on different lines. I agree that it was hard to *guess* that. – devnull Jul 29 '13 at 07:02
@devnull :) I guessed something like this.. however, now we have two solutions for two slightly different use cases. as a result, this is not so bad..... – hek2mgl Jul 29 '13 at 07:05
what if there are 2 fields ? and how does it know if which file it should search ? – t28292 Jul 29 '13 at 07:08
@user2613272 Scroll to the above code field to the right.. You'll need to give the file name as an argument.. It should work with two fields.. doesn't it? – hek2mgl Jul 29 '13 at 07:09

finding duplicates in a field and printing them in unix bash

3 Answers3