12
library(tidyverse) 

I'm stuck on something that should be so simple! Using the code below, all I want to do is group and summarise the three "Var" columns. I want counts and sums (so that I can create three percentage columns, so bonus if you can include an easy way to accomplish this in your answer). However, I don't want to include the NA's. Removing the NA's from sum is easy enough by using "na.rm=TRUE", but I can't seem to figure out how to not include the NA's in the counts (using n() ) while using dplyr::summarise_at.

Am I missing something very simple?

Df%>%group_by(Group)%>%summarise_at(vars(Var1:Var3),funs(n(),sum((.),na.rm=TRUE)))

Group<-c("House","Condo","House","House","House","House","House","Condo")
Var1<-c(0,1,1,NA,1,1,1,0)    
Var2<-c(1,1,1,1,0,1,1,1)
Var3<-c(1,1,1,NA,NA,1,1,0)

Df<-data.frame(Group,Var1,Var2,Var3)
Mike
  • 2,017
  • 6
  • 26
  • 53

2 Answers2

14

I think your code was very close to getting the job done. I made some slight changes and have included an example of how you might include the percent calculation in the same step (although I am not sure of your expected output).

library(dplyr)
Df %>% 
  group_by(Group) %>% 
  summarise_all(funs(count = sum(!is.na(.)), 
                     sum = sum(.,na.rm=TRUE),
                     pct = sum(.,na.rm=TRUE)/sum(!is.na(.))))

#> # A tibble: 2 x 10
#>    Group Var1_count Var2_count Var3_count Var1_sum Var2_sum Var3_sum
#>   <fctr>      <int>      <int>      <int>    <dbl>    <dbl>    <dbl>
#> 1  Condo          2          2          2        1        2        1
#> 2  House          5          6          4        4        5        4
#> # ... with 3 more variables: Var1_pct <dbl>, Var2_pct <dbl>,
#> #   Var3_pct <dbl>

I've also used summarise_all instead of summarise_at as summarise_all works on all the variables which aren't group variables.

markdly
  • 4,394
  • 2
  • 19
  • 27
  • It is also possible to specify `na.rm = TRUE` outside the `funs` argument: `Df%>% group_by(Group) %>% summarise_all(funs(n= sum(!is.na(.)), sum), na.rm = TRUE)` Although this doesn't seem to allow the trick of adding new columns such as 'percent' (cf @markdly 's answer), you can use many different functions in one call without needing to specify `na.rm` all the time... – tjebo Jan 12 '18 at 11:02
  • @markdly, Thanks for the answer. Can you please add the output for `summarise_at` too? I receive an error like: `Error: expecting a one sided formula, a function, or a function name.` – Prradep Dec 01 '20 at 12:02
-1

I think you just need to move your 'na.rm()' argument back in the parentheses. See below:

Group<-c("House","Condo","House","House","House","House","House","Condo")
Var1<-c(0,1,1,NA,1,1,1,0)    
Var2<-c(1,1,1,1,0,1,1,1)
Var3<-c(1,1,1,NA,NA,1,1,0)

Df<-data.frame(Group,Var1,Var2,Var3)

out <- Df %>%
  group_by(Group) %>% 
  mutate_at(vars(Var1:Var3), funs(total = sum(!(is.na(.))), sum = sum(., na.rm = T))) %>% 
  ungroup()
cody_stinson
  • 390
  • 1
  • 3
  • 12