How can I print the duplicates in a file only once?

Question

I have an input file that contains:

123,apple,orange
123,pineapple,strawberry
543,grapes,orange
790,strawberry,apple
870,peach,grape
543,almond,tomato
123,orange,apple

i want the output to be: The following numbers are repeated: 123 543

is there a way to get this output using awk; i'm writing the script in solaris , bash

Your question is to print only duplicate lines and only once. This title can be misleading. — mchid, Sep 02 '20 at 10:50

score 3 · Answer 1 · answered Aug 17 '13 at 16:48

3

sed -e 's/,/ , /g' <filename> | awk '{print $1}' | sort | uniq -d

answered Aug 17 '13 at 16:48

iamauser

11,119
5
34
52

Philipp Claßen · Answer 2 · 2013-08-17T17:27:45.707

1

If you can live without awk, you can use this to get the repeating numbers:

cut -d, -f 1 my_file.txt  | sort | uniq -d

Prints

123
543

Edit: (in response to your comment)

You can buffer the output and decide if you want to continue. For example:

out=$(cut -d, -f 1 a.txt | sort | uniq -d | tr '\n' ' ')
if [[ -n $out ]] ; then
    echo "The following numbers are repeated: $out"
    exit
fi

# continue...

edited Aug 17 '13 at 17:27

answered Aug 17 '13 at 16:47

Philipp Claßen

41,306
31
146
239

If I want the following message to be displayed before the numbers: The following numbers are repeated: 123 543 then abort the program ; else continue with the rest of the script normally. – t28292 Aug 17 '13 at 17:06
I tried it , when there's a duplicate it displays the duplicated number but then command not found example 123: command not found – t28292 Aug 17 '13 at 18:01
@user2613272 I'm not sure. I could only test under Linux, but maybe it is a Solaris issue. Are you sure that you are using Bash and not the system default (which could be another shell)? – Philipp Claßen Aug 17 '13 at 19:12
what does tr '\n' ' ' and -n $out stand for ? – t28292 Aug 19 '13 at 19:30
@user2613272 `-n $out` tests whether the output of the last command was not empty. The `tr` command replaces newlines by spaces (to match your example output where everything is on one line). – Philipp Claßen Aug 19 '13 at 20:16

user000001 · Answer 3 · 2013-08-17T17:40:36.210

1

This script will print only the number of the first column that are repeated more than once:

awk -F, '{a[$1]++}END{printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print ""}' file

Or in a bit shorter form:

awk -F, 'BEGIN{printf "Repeated "}(a[$1]++ == 1){printf "%s ", $1}END{print ""} ' file

If you want to exit your script in case a dup is found, then you can exit a non-zero exit code. For example:

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(1)}}' file

In your main script you can do:

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(-1)}}' file || exit -1

Or in a more readable format:

awk -F, '
    a[$1]++==1{
        dup=1
    }
    END{
        if (dup) {
            printf "The following numbers are repeated: ";
            for (i in a) 
                if (a[i]>1) 
                    printf "%s ",i; 
            print "";
            exit(-1)
        }
    }
' file || exit -1

edited Aug 17 '13 at 17:40

answered Aug 17 '13 at 16:47

user000001

32,226
12
81
108

okay, it works perfectly thank you , now what if i want to abort the script in case there are repeated numbers ? – t28292 Aug 17 '13 at 17:23
But there's one more problem, in case there are no duplicates, the message: the following numbers are repeated still shows ; is there a way that doesn't show that message in case no duplicates are there ? – t28292 Aug 17 '13 at 17:33
@user2613272 Updated. Now it does not print anything if no dups are found, and if at least one is found, it return a non-zero exit code. You can use this in your script if you add a `... || exit` after the `awk` command, or if you use it in an `if` statement like `if ! awk '....' file; then exit; fi` – user000001 Aug 17 '13 at 17:42
it still displays the message : The following numbers are repeated when there are no duplicated numbers – t28292 Aug 17 '13 at 18:40
@user2613272 try the two last scripts. It should display nothing if not dupes are found – user000001 Aug 17 '13 at 18:55
@user2613272 Well I tried it too on my pc and it doesn't... Who knows what's going on :/ – user000001 Aug 17 '13 at 19:07

score 1 · Answer 4 · answered Aug 17 '13 at 16:48

1

awk -vFS=',' \
     '{KEY=$1;if (KEY in KEYS) { DUPS[KEY]; }; KEYS[KEY]; }   \
      END{print "Repeated Keys:"; for (i in DUPS){print i} }' \
< yourfile

There are solutions with sort/uniq/cut as well (see above).

answered Aug 17 '13 at 16:48

M.E.L.

613
3
8

How can I print the duplicates in a file only once?

4 Answers4