How to calculate # of unique names in each group

Question

can anyone show me how to get the following done in R? I want to calculate the number of unique people in each group, like the sample showed below, the first column correspond to each group (there're 3 groups here), and second column means people's name (eg, in group 1, people A's name show up 3 times. The third column is the one I want to generate in R (if someone's name shows up x times in a certain group, then the last column should indicates x). Thank you all!

    x <- read.table(header=T, text="group peoplename noofuniquepeople
1 A 3
1 B 1
1 A 3
1 A 3
1 D 1
2 M 1
2 K 2
2 T 3
2 T 3
2 K 2
2 T 3
3 E 2
3 F 1
3 E 2
3 G 2
3 G 2
3 V 1")

This may be helpful: http://stackoverflow.com/questions/17223308/fastest-way-to-count-occurrences-of-each-unique-element — Ferdinand.kraft, Sep 07 '13 at 10:39

score 2 · Accepted Answer · answered Sep 07 '13 at 10:55

2

Using ave and within:

within(x, Freq <- ave(1:nrow(x), peoplename, group, FUN=length))

answered Sep 07 '13 at 10:55

Ferdinand.kraft

12,579
10
47
69

this one is easy to remember, thanks – user001 Sep 07 '13 at 21:27

score 1 · Answer 2 · answered Sep 07 '13 at 09:35

You should ideally put in what you've tried first. We can help you debug.

Anyhow,

> df = data.frame(N = c("A","B","A","A","D","M","K","T","T","K","T","E","F","E","G","G","V"), G = c(3,1,3,3,1,1,2,3,3,2,3,2,1,2,2,2,1))
> df
   N G
1  A 3
2  B 1
3  A 3
4  A 3
5  D 1
6  M 1
7  K 2
8  T 3
9  T 3
10 K 2
11 T 3
12 E 2
13 F 1
14 E 2
15 G 2
16 G 2
17 V 1

> numberOfGroups = length(unique(df$G))
> numberOfGroups
[1] 3

> require(plyr)
> uniqueInGroup <- dlply(df,.fun=unique,.variables=.(G))
> uniqueInGroup
$`1`
  N G
1 B 1
2 D 1
3 M 1
4 F 1
5 V 1

$`2`
  N G
1 K 2
3 E 2
5 G 2

$`3`
  N G
1 A 3
4 T 3

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  G
1 1
2 2
3 3

lapply(uniqueInGroup, function(x) return(length(unique(x$N))))

Oops, took the third col to be group. Run this script with the 1st col instead, and you'll have the required output.

score 1 · Answer 3 · answered Sep 07 '13 at 09:35

There may be better ways, but

x$gp     <- paste(x$group, x$peoplename)
x_new    <- merge (x, table(x$gp), by.x="gp", by.y="Var1")
x_new$gp <- NULL

produces

> x_new
   group peoplename noofuniquepeople Freq
1      1          A                3    3
2      1          A                3    3
3      1          A                3    3
4      1          B                1    1
5      1          D                1    1
6      2          K                2    2
7      2          K                2    2
8      2          M                1    1
9      2          T                3    3
10     2          T                3    3
11     2          T                3    3
12     3          E                2    2
13     3          E                2    2
14     3          F                1    1
15     3          G                2    2
16     3          G                2    2
17     3          V                1    1

and the last two columns are the same

Simon O'Hanlon · Answer 4 · 2013-09-07T12:49:34.217

Using good old base::aggregate this has the advantage (in my opinion) of aggregating your data to display one row for each group and peoplename within that group. length gives how many times that combination occurs:

aggregate( . ~ peoplename + group , data = x , FUN = length )
#   peoplename group noofuniquepeople
#1           A     1                3
#2           B     1                1
#3           D     1                1
#4           K     2                2
#5           M     2                1
#6           T     2                3
#7           E     3                2
#8           F     3                1
#9           G     3                2
#10          V     3                1

By the way, if you input data was missing the noofuniquepeople column (which I assume it is because you want to calculate it) you don't need it. You can use a dummy variable to aggregate on like this:

Unique = rep( 1 , nrow(x) )
aggregate( Unique ~ peoplename + group , data = x , FUN = sum )

How to calculate # of unique names in each group

4 Answers4