2

can anyone show me how to get the following done in R? I want to calculate the number of unique people in each group, like the sample showed below, the first column correspond to each group (there're 3 groups here), and second column means people's name (eg, in group 1, people A's name show up 3 times. The third column is the one I want to generate in R (if someone's name shows up x times in a certain group, then the last column should indicates x). Thank you all!

    x <- read.table(header=T, text="group peoplename noofuniquepeople
1 A 3
1 B 1
1 A 3
1 A 3
1 D 1
2 M 1
2 K 2
2 T 3
2 T 3
2 K 2
2 T 3
3 E 2
3 F 1
3 E 2
3 G 2
3 G 2
3 V 1")
user001
  • 185
  • 1
  • 1
  • 9

4 Answers4

2

Using ave and within:

within(x, Freq <- ave(1:nrow(x), peoplename, group, FUN=length))
Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
1

You should ideally put in what you've tried first. We can help you debug.

Anyhow,

> df = data.frame(N = c("A","B","A","A","D","M","K","T","T","K","T","E","F","E","G","G","V"), G = c(3,1,3,3,1,1,2,3,3,2,3,2,1,2,2,2,1))
> df
   N G
1  A 3
2  B 1
3  A 3
4  A 3
5  D 1
6  M 1
7  K 2
8  T 3
9  T 3
10 K 2
11 T 3
12 E 2
13 F 1
14 E 2
15 G 2
16 G 2
17 V 1

> numberOfGroups = length(unique(df$G))
> numberOfGroups
[1] 3

> require(plyr)
> uniqueInGroup <- dlply(df,.fun=unique,.variables=.(G))
> uniqueInGroup
$`1`
  N G
1 B 1
2 D 1
3 M 1
4 F 1
5 V 1

$`2`
  N G
1 K 2
3 E 2
5 G 2

$`3`
  N G
1 A 3
4 T 3

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  G
1 1
2 2
3 3

lapply(uniqueInGroup, function(x) return(length(unique(x$N))))

Oops, took the third col to be group. Run this script with the 1st col instead, and you'll have the required output.

jackStinger
  • 2,035
  • 5
  • 23
  • 36
1

There may be better ways, but

x$gp     <- paste(x$group, x$peoplename)
x_new    <- merge (x, table(x$gp), by.x="gp", by.y="Var1")
x_new$gp <- NULL

produces

> x_new
   group peoplename noofuniquepeople Freq
1      1          A                3    3
2      1          A                3    3
3      1          A                3    3
4      1          B                1    1
5      1          D                1    1
6      2          K                2    2
7      2          K                2    2
8      2          M                1    1
9      2          T                3    3
10     2          T                3    3
11     2          T                3    3
12     3          E                2    2
13     3          E                2    2
14     3          F                1    1
15     3          G                2    2
16     3          G                2    2
17     3          V                1    1

and the last two columns are the same

Henry
  • 6,704
  • 2
  • 23
  • 39
1

Using good old base::aggregate this has the advantage (in my opinion) of aggregating your data to display one row for each group and peoplename within that group. length gives how many times that combination occurs:

aggregate( . ~ peoplename + group , data = x , FUN = length )
#   peoplename group noofuniquepeople
#1           A     1                3
#2           B     1                1
#3           D     1                1
#4           K     2                2
#5           M     2                1
#6           T     2                3
#7           E     3                2
#8           F     3                1
#9           G     3                2
#10          V     3                1

By the way, if you input data was missing the noofuniquepeople column (which I assume it is because you want to calculate it) you don't need it. You can use a dummy variable to aggregate on like this:

Unique = rep( 1 , nrow(x) )
aggregate( Unique ~ peoplename + group , data = x , FUN = sum )
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184