5

I have a file the contains

apple
apple
banana
orange
apple
orange

I want a script that finds the duplicates apple and orange and tells the user that the following : apple and orange are repeated. I tried

nawk '!x[$1]++' FS="," filename

to find repeated item so how can i print them out in unix bash ?

Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
t28292
  • 573
  • 2
  • 7
  • 12

3 Answers3

11

In order to print the duplicate lines, you can say:

$ sort filename | uniq -d
apple
orange

If you want to print the count as well, supply the -c option to uniq:

$ sort filename | uniq -dc
      3 apple
      2 orange
devnull
  • 118,548
  • 33
  • 236
  • 227
4

+1 for devnul's answer. However, if the file contains spaces instead of newlines as delimiter. then the following would work.

tr [:blank:] "\n" < filename | sort | uniq -d
Community
  • 1
  • 1
Varun
  • 691
  • 4
  • 9
1

Update:

The question has been changed significantly. Formerly, when answering this, the input file should look like:

apple apple banana orange apple orange
banana orange apple
...

However, the solution will work anyway, but might be a little bit too complicated for this special use case.


The following awk script will do the job:

awk '{i=1;while(i <= NF){a[$(i++)]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' your.file

Output:

apple 3
orange 2

It is more understandable in a form like this:

#!/usr/bin/awk

{
  i=1;
  # iterate through every field
  while(i <= NF) {
    a[$(i++)]++; # count occurrences of every field
  }
}

# after all input lines have been read ...
END {
  for(i in a) {
    # ... print those fields which occurred more than 1 time
    if(a[i] > 1) {
      print i,a[i];
    }
  }
}

Then make the file executable and execute it passing the input file name to it:

chmod +x script.awk
./script.awk your.file  
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • +1. On attempting to format the question, it became evident that the input file had items placed on different lines. I agree that it was hard to *guess* that. – devnull Jul 29 '13 at 07:02
  • @devnull :) I guessed something like this.. however, now we have two solutions for two slightly different use cases. as a result, this is not so bad..... – hek2mgl Jul 29 '13 at 07:05
  • what if there are 2 fields ? and how does it know if which file it should search ? – t28292 Jul 29 '13 at 07:08
  • @user2613272 Scroll to the above code field to the right.. You'll need to give the file name as an argument.. It should work with two fields.. doesn't it? – hek2mgl Jul 29 '13 at 07:09