This is counting the lines of a file:
wc -l file.txt
This is counting the number of different items that occur in the 3rd column:
gawk '{print $3}' file.txt | sort | uniq | wc -l
All of it is something awk
can do in an easy way:
awk '{uniq[$3]} END{print NR, length(uniq), NR/length(uniq)}' file.txt
That is:
{uniq[$3]}
to keep track of the items that appeared in the 3rd column
END{print NR, length(uniq), NR/length(uniq)}
to print the number of lines (NR
) as well as the number of different items and its division. This is because NR
in the END
block normally holds the number of the last line that was read, and hence the number of lines, and length()
is a function that returns the number of items in an array.
Test
$ cat a
1
2
3
1
2
3
$ awk '{uniq[$1]} END{print NR, length(uniq), NR/length(uniq)}' a
6 3 2
$ awk '{uniq[$1]} END{printf "lines: %d; different items: %d; proportion: %f\n", NR, length(uniq), NR/length(uniq)}' a
lines: 6; different items: 3; proportion: 2.000000