How to count the amount of unique lines, duplicate lines and lines that appear three times in a text file

Question

I have a list of names, one name per line saved as a .txt file.

I'm trying to use bash to determine how many different names appear once, two times or three times.

For example:

names.txt looks like

Donald
Donald
Lisa
John
Lisa
Donald

In this case the amount of unique lines is 1, there's 1 duplicate and 1 name appears 3 times. I'm trying to get these amounts on a bigger list with using uniq. I know that I can use uniq -u and uniq -d for uniques and duplicates but I'm not quite sure how to do it with names that appear 3 times.

This is very broad. What should the output look like? Have you actually tried using `uniq`? It won't work on its own, and I don't think you can use it directly to find words appearing three times. (What about words appearing more than three times?) — Benjamin W., May 04 '16 at 20:35

rob mayoff · Accepted Answer · 2016-05-04T20:48:23.613

$ echo 'Donald
Donald
Lisa
John
Lisa
Donald' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
   1 1
   1 2
   1 3

The right column is the repetition count, and the left column is the number of unique names with that repetition count. E.g. “Donald” has a repetition count of 3.

Bigger example:

echo 'Donald
Donald
Rob
Lisa
WhatAmIDoing
John
Obama
Obama
Lisa
Washington
Donald' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
   4 1
   2 2
   1 3

Four names (“Rob”, “WhatAmIDoing”, “John”, and “Washington”) each have a repetition count of 1. Two names (“Lisa” and “Obama”) each have a repetition count of 2. One name (“Donald”) has a repetition count of 3.

I was just going to post exactly this with my convoluted awk answer below. Very nice. — JNevill, May 04 '16 at 20:45

score 1 · Answer 2 · answered May 05 '16 at 15:23

1

If you want to see the actual names for each "repetition count", perl is a good choice with its very flexible data structures:

perl -lne '
    $count{$_}++;
    END {
        while (($name, $num) = each %count) {
            push @{$map{$num}}, $name;
        }
        while (($num, $names) = each %map) {
            print "$num: @$names";
        }
    }
' << NAMES
Donald
Donald
Lisa
John
Lisa
Jim
Bob
Jim
Donald
NAMES

3: Donald
1: John Bob
2: Jim Lisa

answered May 05 '16 at 15:23

glenn jackman

238,783
38
220
352

You omitted "Hillary" ;-) – Mark Setchell May 05 '16 at 15:30
1

No, that's just Donald, not "The Donald". Woo, political satire on SO: edgy! – glenn jackman May 05 '16 at 15:32

score 0 · Answer 3 · answered May 04 '16 at 20:41

You can get a distinct count of names by using arrays in awk:

awk '{ a[$1]++ } END { for (n in a) print n, a[n] } ' yourfile

I suppose, if you wanted to go one step further, you could run the same awk script against the output of this one, but on $2, to get the count of counts, which sort of sounds like what you are after. Sort of like:

awk '{ a[$1]++ } END { for (n in a) print n, a[n] } ' test |  awk '{ a[$2]++ } END { for (n in a) print n, a[n] } '

Which will output:

1 1
2 1
3 1

Which is saying "There are 1 distinct names that show up once. There are 1 distinct names that show up 2 times. There are one distinct names that show up 3 times"

I'm certain that could be done in a single awk script, but this seems simple enough as it is and it's pretty easy to parse.

score 0 · Answer 4 · answered May 04 '16 at 20:42

0

Are you wanting to show something like this?

3 Donald
2 Lisa
1 John

If yes, then this following should do:

cat <file> | sort | uniq -c | sort -rn

answered May 04 '16 at 20:42

MSameer

413
2
9

score 0 · Answer 5 · answered May 04 '16 at 20:53

0

awk to the rescue!

awk '{a[$1]++} END{for(k in a) b[a[k]]++; for(k in b) print b[k], k}' names

answered May 04 '16 at 20:53

karakfa

66,216
7
41
56

more descriptive variable names would help. – glenn jackman May 05 '16 at 15:14

How to count the amount of unique lines, duplicate lines and lines that appear three times in a text file

5 Answers5