3

I have an db that look like this:

ID Group Drink
1   A      yes
2   A      no
3   A      NA
4   B      no
5   B      no
6   B      yes

and I would like measure how many people of group A drinks and how many people in group B drinks.

I am using length(), but this function returns 3 (NA is being considered = yes). How can I fix it?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Tormod
  • 83
  • 6

5 Answers5

4

table() is one option:

db <- read.table(text = "ID Group Drink
1   A      yes
2   A      no
3   A      NA
4   B      no
5   B      no
6   B      yes", header = TRUE)

with(db, table(Drink))
with(db, table(Group, Drink))

> with(db, table(Drink))
Drink
 no yes 
  3   2 
> with(db, table(Group, Drink))
     Drink
Group no yes
    A  1   1
    B  2   1

Including the NA as a class is achieved by the useNA argument:

with(db, table(Drink, useNA = "ifany"))

> with(db, table(Drink, useNA = "ifany"))
Drink
  no  yes <NA> 
   3    2    1

You can of course store the objects returned by table() and access them as any other matrix/array:

tab <- with(db, table(Drink, useNA = "ifany"))
tab[1]
tab2 <- with(db, table(Group, Drink, useNA = "ifany"))
tab2[,1]
tab2[1,]

> tab <- with(db, table(Drink, useNA = "ifany"))
> tab[1]
no 
 3 
> tab <- with(db, table(Drink, useNA = "ifany"))
> tab[1]
no 
 3 
> tab2 <- with(db, table(Group, Drink, useNA = "ifany"))
> tab2[,1]
A B 
1 2 
> tab2[1,]
  no  yes <NA> 
   1    1    1
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
2

Here's another way using aggregate(...)

aggregate(Drink~Group,df,function(x)sum(x=="yes"))
#   Group Drink
# 1     A     1
# 2     B     1

To get the percent that drink:

aggregate(Drink~Group,df,function(x)sum(x=="yes")/length(!is.na(x)))
#   Group     Drink
# 1     A 0.5000000
# 2     B 0.3333333
jlhoward
  • 58,004
  • 7
  • 97
  • 140
1

xtabs is another option:

xtabs(~ Group + Drink, df)

#     Drink
#Group no yes
#    A  1   1
#    B  2   1

And in case you need a data.frame as output:

as.data.frame(xtabs(~ Group + Drink, df))

#  Group Drink Freq
#1     A    no    1
#2     B    no    2
#3     A   yes    1
#4     B   yes    1
talat
  • 68,970
  • 21
  • 126
  • 157
0

Assuming d is your data, and assuming NA is considered yes (since you stated that in your post), the proportions of drinkers are

> d$Drink[is.na(d$Drink)] <- 'yes'
> tab <- table(d$Group, d$Drink)
> tab[,'yes']/rowSums(tab)
##         A         B 
## 0.6666667 0.3333333 

You could also play around with the count function in package plyr

> library(plyr)
> x <- count(d)
> cbind(x[x$Drink == 'yes', ], inGroup = count(d$Group)$freq)
#   Group Drink freq inGroup
# 2     A   yes    2       3
# 4     B   yes    1       3
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
-1

You could also make use of the function prop.table() which will add proportions to your table values that are returned.

lawyeR
  • 7,488
  • 5
  • 33
  • 63