3

I have a list of names, one name per line saved as a .txt file.

I'm trying to use bash to determine how many different names appear once, two times or three times.

For example:

names.txt looks like

Donald
Donald
Lisa
John
Lisa
Donald

In this case the amount of unique lines is 1, there's 1 duplicate and 1 name appears 3 times. I'm trying to get these amounts on a bigger list with using uniq. I know that I can use uniq -u and uniq -d for uniques and duplicates but I'm not quite sure how to do it with names that appear 3 times.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
WhatAmIDoing
  • 33
  • 1
  • 3
  • This is very broad. What should the output look like? Have you actually tried using `uniq`? It won't work on its own, and I don't think you can use it directly to find words appearing three times. (What about words appearing more than three times?) – Benjamin W. May 04 '16 at 20:35

5 Answers5

3
$ echo 'Donald
Donald
Lisa
John
Lisa
Donald' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
   1 1
   1 2
   1 3

The right column is the repetition count, and the left column is the number of unique names with that repetition count. E.g. “Donald” has a repetition count of 3.

Bigger example:

echo 'Donald
Donald
Rob
Lisa
WhatAmIDoing
John
Obama
Obama
Lisa
Washington
Donald' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
   4 1
   2 2
   1 3

Four names (“Rob”, “WhatAmIDoing”, “John”, and “Washington”) each have a repetition count of 1. Two names (“Lisa” and “Obama”) each have a repetition count of 2. One name (“Donald”) has a repetition count of 3.

rob mayoff
  • 375,296
  • 67
  • 796
  • 848
1

If you want to see the actual names for each "repetition count", perl is a good choice with its very flexible data structures:

perl -lne '
    $count{$_}++;
    END {
        while (($name, $num) = each %count) {
            push @{$map{$num}}, $name;
        }
        while (($num, $names) = each %map) {
            print "$num: @$names";
        }
    }
' << NAMES
Donald
Donald
Lisa
John
Lisa
Jim
Bob
Jim
Donald
NAMES
3: Donald
1: John Bob
2: Jim Lisa
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
0

You can get a distinct count of names by using arrays in awk:

awk '{ a[$1]++ } END { for (n in a) print n, a[n] } ' yourfile

I suppose, if you wanted to go one step further, you could run the same awk script against the output of this one, but on $2, to get the count of counts, which sort of sounds like what you are after. Sort of like:

awk '{ a[$1]++ } END { for (n in a) print n, a[n] } ' test |  awk '{ a[$2]++ } END { for (n in a) print n, a[n] } '

Which will output:

1 1
2 1
3 1

Which is saying "There are 1 distinct names that show up once. There are 1 distinct names that show up 2 times. There are one distinct names that show up 3 times"

I'm certain that could be done in a single awk script, but this seems simple enough as it is and it's pretty easy to parse.

JNevill
  • 46,980
  • 4
  • 38
  • 63
0

Are you wanting to show something like this?

3 Donald
2 Lisa
1 John

If yes, then this following should do:

cat <file> | sort | uniq -c | sort -rn
MSameer
  • 413
  • 2
  • 9
0

awk to the rescue!

awk '{a[$1]++} END{for(k in a) b[a[k]]++; for(k in b) print b[k], k}' names
karakfa
  • 66,216
  • 7
  • 41
  • 56