Ddply and summary of categorical variables

Question

I have a dataframe x like this

Id   Group   Var1
001    A     yes
002    A     no
003    A     yes
004    B     no
005    B     yes
006    C     no

I want to create a data frame like this

Group    yes    no
A        2      1
B        1      1
C        0      1

The function .aggregate works well

aggregate(x$Var1 ~ x$Group,FUN=summary)

but I am not able to create a dataframe with the results.

If I try using .ddply

ddply(x,"Group",function(x) summary(x$Var1))

I obtain the error: Results do not have equal lengths.

What am I doing wrong?

Thanks.

`ddply(x,"Group",function(x) summary(x$Var1))` works fine for me — user1317221_G, Feb 17 '13 at 15:46
The version is 1.8. I think the problem is due to the presence of NAs in my dataframe, but I don't understand why this error comes up with .ddply but not with .aggregate — corrado, Feb 17 '13 at 15:48

score 4 · Answer 1 · answered Feb 17 '13 at 15:53

This doesn't answer your question about ddply, but it should help you with your aggregate output.The second column in the aggregate command that you used is a matrix, but you can wrap the whole output in a do.call(data.frame... statement to get a data frame instead. Assuming your data.frame is called "mydf":

temp <- do.call(data.frame, aggregate(Var1 ~ Group, mydf, summary))
temp
#   Group Var1.no Var1.yes
# 1     A       1        2
# 2     B       1        1
# 3     C       1        0
str(temp)
# 'data.frame':  3 obs. of  3 variables:
#  $ Group   : Factor w/ 3 levels "A","B","C": 1 2 3
#  $ Var1.no : int  1 1 1
#  $ Var1.yes: int  2 1 0

Alternatively, you might look at table:

table(mydf$Group, mydf$Var1)
#    
#     no yes
#   A  1   2
#   B  1   1
#   C  1   0
as.data.frame.matrix(table(mydf$Group, mydf$Var1))
#   no yes
# A  1   2
# B  1   1
# C  1   0

agstudy · Accepted Answer · 2013-02-17T16:05:31.690

3

I introduce an NA in your data

dat <- read.table(text = 'Id   Group   Var1
001    A     yes
002    A     no
003    A     NA     ## here!
004    B     no
005    B     yes
006    C     no',head = T)

You need to remove NA before summary , because summary create a column for NA and aggregate formula method has a default setting of na.action = na.omitwhich would exclude the extra NA' column. Here a workaround, I remove the NA before the summary:

 library(plyr)
  ddply(dat,"Group",function(x) {
    x <- na.omit(x$Var1)
    y <- summary(x)
})
 Group no yes
1     A  1   1
2     B  1   1
3     C  1   0

which is equiavlent to

x <- dat
aggregate(x$Var1 ~ x$Group,FUN=summary)
  x$Group x$Var1.no x$Var1.yes
1       A         1          1
2       B         1          1
3       C         1          0

edited Feb 17 '13 at 16:05

answered Feb 17 '13 at 15:42

agstudy

119,832
17
199
261

1

+1. Perhaps it would be useful to point out that the `aggregate` formula method has a default setting of `na.action = na.omit" which would account for the difference (or rather, similarity) between the two approaches. – A5C1D2H2I1M1N2O1R2T1 Feb 17 '13 at 16:01
@AnandaMahto Well put! I'll add it. I don't know very well the `aggregate` function. – agstudy Feb 17 '13 at 16:03

Ddply and summary of categorical variables

2 Answers2