1

I have an input file that contains:

123,apple,orange
123,pineapple,strawberry
543,grapes,orange
790,strawberry,apple
870,peach,grape
543,almond,tomato
123,orange,apple

i want the output to be: The following numbers are repeated: 123 543

is there a way to get this output using awk; i'm writing the script in solaris , bash

t28292
  • 573
  • 2
  • 7
  • 12
  • Your question is to print only duplicate lines and only once. This title can be misleading. – mchid Sep 02 '20 at 10:50

4 Answers4

3
sed -e 's/,/ , /g' <filename> | awk '{print $1}' | sort | uniq -d
iamauser
  • 11,119
  • 5
  • 34
  • 52
1

If you can live without awk, you can use this to get the repeating numbers:

cut -d, -f 1 my_file.txt  | sort | uniq -d

Prints

123
543

Edit: (in response to your comment)

You can buffer the output and decide if you want to continue. For example:

out=$(cut -d, -f 1 a.txt | sort | uniq -d | tr '\n' ' ')
if [[ -n $out ]] ; then
    echo "The following numbers are repeated: $out"
    exit
fi

# continue...
Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239
  • If I want the following message to be displayed before the numbers: The following numbers are repeated: 123 543 then abort the program ; else continue with the rest of the script normally. – t28292 Aug 17 '13 at 17:06
  • I tried it , when there's a duplicate it displays the duplicated number but then command not found example 123: command not found – t28292 Aug 17 '13 at 18:01
  • @user2613272 I'm not sure. I could only test under Linux, but maybe it is a Solaris issue. Are you sure that you are using Bash and not the system default (which could be another shell)? – Philipp Claßen Aug 17 '13 at 19:12
  • what does tr '\n' ' ' and -n $out stand for ? – t28292 Aug 19 '13 at 19:30
  • @user2613272 `-n $out` tests whether the output of the last command was not empty. The `tr` command replaces newlines by spaces (to match your example output where everything is on one line). – Philipp Claßen Aug 19 '13 at 20:16
1

This script will print only the number of the first column that are repeated more than once:

awk -F, '{a[$1]++}END{printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print ""}' file

Or in a bit shorter form:

awk -F, 'BEGIN{printf "Repeated "}(a[$1]++ == 1){printf "%s ", $1}END{print ""} ' file

If you want to exit your script in case a dup is found, then you can exit a non-zero exit code. For example:

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(1)}}' file

In your main script you can do:

awk -F, 'a[$1]++==1{dup=1}END{if (dup) {printf "The following numbers are repeated: ";for (i in a) if (a[i]>1) printf "%s ",i; print "";exit(-1)}}' file || exit -1

Or in a more readable format:

awk -F, '
    a[$1]++==1{
        dup=1
    }
    END{
        if (dup) {
            printf "The following numbers are repeated: ";
            for (i in a) 
                if (a[i]>1) 
                    printf "%s ",i; 
            print "";
            exit(-1)
        }
    }
' file || exit -1
user000001
  • 32,226
  • 12
  • 81
  • 108
  • okay, it works perfectly thank you , now what if i want to abort the script in case there are repeated numbers ? – t28292 Aug 17 '13 at 17:23
  • But there's one more problem, in case there are no duplicates, the message: the following numbers are repeated still shows ; is there a way that doesn't show that message in case no duplicates are there ? – t28292 Aug 17 '13 at 17:33
  • @user2613272 Updated. Now it does not print anything if no dups are found, and if at least one is found, it return a non-zero exit code. You can use this in your script if you add a `... || exit` after the `awk` command, or if you use it in an `if` statement like `if ! awk '....' file; then exit; fi` – user000001 Aug 17 '13 at 17:42
  • it still displays the message : The following numbers are repeated when there are no duplicated numbers – t28292 Aug 17 '13 at 18:40
  • @user2613272 try the two last scripts. It should display nothing if not dupes are found – user000001 Aug 17 '13 at 18:55
  • @user2613272 Well I tried it too on my pc and it doesn't... Who knows what's going on :/ – user000001 Aug 17 '13 at 19:07
1
awk -vFS=',' \
     '{KEY=$1;if (KEY in KEYS) { DUPS[KEY]; }; KEYS[KEY]; }   \
      END{print "Repeated Keys:"; for (i in DUPS){print i} }' \
< yourfile

There are solutions with sort/uniq/cut as well (see above).

M.E.L.
  • 613
  • 3
  • 8