The simplest way is a double-pass algorithm where you read your file twice.
The idea is to store all values in an array a
and count how many times they appear. If the value appears 2 or more times, it means you have found more then a single entry and you should not print the line.
awk '(NR==FNR){a[$2]++; if(NF>2) a[$3]++; next}
(NF==2) && (a[$2]==1);
(NF==3) && (a[$2]==1 && a[$3]==1)' <file> <file>
In practice, you should avoid things such as a[var]==1
if you are not sure whether var
is in the array as it will create that array element. However, since we never increase it any more, it is fine to proceed.
If you want to achieve the same thing with more then three fields you can do:
awk '(NR==FNR){for(i=2;i<=NF;++i) a[$i]++; next }
{for(i=2;i<=NF;++i) if(a[$i]>1) next }
{print}' <file> <file>
While both these solutions read the file twice, you can also store the full file in memory and read the file only a single time. This, however, is exactly the same algorithm:
awk '{for(i=2;i<=NF;++i) a[$i]++; b[NR]=$0}
END{ for(j=1;j<=NR;++j) {
$0=b[j];
for(i=2;i<=NF;++i) if(a[$i]>1) continue
print $0
}
}' <file>
comment: this single-pass solution is very simple and stores the full file in memory. The solution of James Brown is very clever. It removes stuff from memory when they are not needed anymore. A bit shorter version is:
awk '{ for(i=2;i<=NF;++i) if ($i in a) delete b[a[$i]]; else { a[$i]=NR; b[NR]=$0 }}
END { for(n=1;n<=NR;++n) if(n in b) print b[n] }' <file>
note: you should never thrive for the shortest solution, but the most readable one!