2

I have a dataframe with multiple variables, each has values of TRUE, FALSE, or NA. I'm trying to summarize the data, but get anything to work quite the way I want.

names <- c("n1","n2","n3","n4","n5","n6")
groupname <- c("g1","g2","g3","g4","g4","g4")
var1 <- c(TRUE,TRUE,NA,FALSE,TRUE,NA)
var2 <- c(FALSE,TRUE,NA,FALSE,TRUE,NA)
var3 <- c(FALSE,TRUE,NA,FALSE,TRUE,NA)
df <- data.frame(names,groupname,var1,var2,var3)

I'm trying to summarize the data for individual groups:

G4      TRUE   FALSE   NA
var1    3      1       2
var2    2      2       2
var3    2      2       2

I can do table(groupname,var1) to do them individually, but I'm trying to get it all in a single table. Any suggestions?

ekad
  • 14,436
  • 26
  • 44
  • 46

2 Answers2

2

using dplyr

library(dplyr)
 df %>% gather("key", "value", var1:var3) %>% 
        group_by(key) %>% 
        summarise(true = sum(value==TRUE, na.rm=T),
                  false = sum(!value, na.rm=T),
                  missing = sum(is.na(value)))

#    key  true false missing
#1  var1     3     1       2
#2  var2     2     2       2
#3  var3     2     2       2
joel.wilson
  • 8,243
  • 5
  • 28
  • 48
Wietze314
  • 5,942
  • 2
  • 21
  • 40
1

In base R, you could use table to get the counts, lapply to run through the variables, and do.call to put the results together. A minor subsetting with [ orders the columns as desired.

do.call(rbind, lapply(df[3:5], table, useNA="ifany"))[, c(2,1,3)]
     TRUE FALSE <NA>
var1    3     1    2
var2    2     2    2
var3    2     2    2

This will work if each variable has all levels (TRUE, FALSE, NA). If one of the levels is missing, you can tell table to fill it with a 0 count by feeding it a factor variable. Here is an example.

# expand data set
df$var4 <- c(TRUE, NA)

do.call(rbind, lapply(df[3:6],
                 function(i) table(factor(i, levels=c(TRUE, FALSE, NA)),
                                   useNA="ifany")))[, c(2,1,3)]

     FALSE TRUE <NA>
var1     1    3    2
var2     2    2    2
var3     2    2    2
var4     0    3    3
lmo
  • 37,904
  • 9
  • 56
  • 69
  • This worked too, although there were some rows that didn't have any FALSE values and the NA count appeared under both the FALSE and NA columns. I don't understand why. – Brandon Booth Feb 03 '17 at 22:46
  • Ah, I missed that. I've added an alternative that will count properly when one or more levels is missing. – lmo Feb 04 '17 at 16:51