awk 'FNR==NR{a[$1]++;next}(a[$1] > 1)' ./infile ./infile
Yes, you give it the same file as input twice. Since you don't know ahead of time if the current record is uniq or not, you build up an array based on $1
on the first pass then you only output records that have seen $1
more than once on the second pass.
I'm sure there are ways to do it with only a single pass through the file but I doubt they will be as "clean"
Explanation
FNR==NR
: This is only true when awk
is reading the first file. It essentially tests total number of records seen (NR) vs the input record in the current file (FNR).
a[$1]++
: Build an associative array a who's key is the first field ($1
) and who's value is incremented by one each time it's seen.
next
: Ignore the rest of the script if this is reached, start over with a new input record
(a[$1] > 1)
This will only be evaluated on the second pass of ./infile
and it only prints records who's first field ($1
) we've seen more than once. Essentially, it is shorthand for if(a[$1] > 1){print $0}
Proof of Concept
$ cat ./infile
1 abcd
1 efgh
2 ijkl
3 mnop
4 qrst
4 uvwx
$ awk 'FNR==NR{a[$1]++;next}(a[$1] > 1)' ./infile ./infile
1 abcd
1 efgh
4 qrst
4 uvwx