1

When computing any statistic using summarise and group_by we only get the summary statistic per-category, and not the value for all the population (Total). How to get both?

I am looking for something clean and short. Until now I can only think of:

bind_rows( 
  iris %>% group_by(Species) %>% summarise(
    "Mean" = mean(Sepal.Width), 
    "Median" = median(Sepal.Width), 
    "sd" = sd(Sepal.Width), 
    "p10" = quantile(Sepal.Width, probs = 0.1))
  , 
  iris %>% summarise(
    "Mean" = mean(Sepal.Width), 
    "Median" = median(Sepal.Width), 
    "sd" = sd(Sepal.Width), 
    "p10" = quantile(Sepal.Width, probs = 0.1)) %>% 
  mutate(Species = "Total")
  )

But I would like something more compact. In particular, I don't want to type the code (for summarize) twice, once for each group and once for the total.

Toto
  • 89,455
  • 62
  • 89
  • 125
  • 1
    I know this is a common way to _present_ some information, but generally it is a poor way to _store_ information as the fourth row is a "very different beast" from rows 1-3. I just wanted to put this warning out there... – DanY Aug 10 '18 at 20:30
  • 1
    I believe your code is only computing each mean only once; do you mean that you don't want to type the code twice, once for each group and once for the total? – Aaron left Stack Overflow Aug 10 '18 at 20:54
  • @DanY: Agreed, and very much appreciate your tact and recognition that different people and circumstances have different needs, this is a great example of how to be friendly on SO. – Aaron left Stack Overflow Aug 10 '18 at 20:56
  • Yes @Aaron. Thanks. Will edit to reflect your comment. – Fernando Hoces De La Guardia Aug 10 '18 at 20:57
  • The title's a misnomer, it's really *"How to also compute summary statistics on entire ungrouped dataset, and concatenate total"* – smci Aug 10 '18 at 23:39

2 Answers2

2

You can simplify it if you untangle what you're trying to do: you have iris data that has several species, and you want that summarized along with data for all species. You don't need to calculate those summary stats before you can bind. Instead, bind iris with a version of iris that's been set to Species = "Total", then group and summarize.

library(tidyverse)

bind_rows(
  iris,
  iris %>% mutate(Species = "Total")
) %>%
  group_by(Species) %>%
  summarise(Mean = mean(Sepal.Width),
            Median = median(Sepal.Width),
            sd = sd(Sepal.Width),
            p10 = quantile(Sepal.Width, probs = 0.1))
#> # A tibble: 4 x 5
#>   Species     Mean Median    sd   p10
#>   <chr>      <dbl>  <dbl> <dbl> <dbl>
#> 1 setosa      3.43    3.4 0.379  3   
#> 2 Total       3.06    3   0.436  2.5 
#> 3 versicolor  2.77    2.8 0.314  2.3 
#> 4 virginica   2.97    3   0.322  2.59

I like the caution in the comments above, though I have to do this sort of calculation for work enough that I have a similar shorthand function in a personal package. It perhaps makes less sense for things like standard deviations, but it's something I need to do a lot for adding up totals of demographic groups, etc. (If it's useful, that function is here).

camille
  • 16,432
  • 18
  • 38
  • 60
  • Thanks @camille!. Best answer so far. I get the comment that this computation requires to store info inefficiently, but I am still surprised that there is no standard function to perform this type of analysis. Makes it harder to convince Stata users to switch. – Fernando Hoces De La Guardia Aug 11 '18 at 01:59
0

bit shorter, though quite similar to bind_rows

    q10 <- function(x){quantile(x , probs=0.1)}

    iris %>% 
      select(Species,Sepal.Width)%>%
      group_by(Species) %>% 
      summarise_all(c("mean", "sd", "q10")) %>% 
      t() %>% 

      cbind(c("total", iris %>% select(Sepal.Width) %>% summarise_all(c("mean", "sd", "q10")))) %>% 
      t()

more clean probably:

  bind_rows( 
    iris %>% 
      group_by(Species) %>%  
      select(Sepal.Width)%>%
      summarise_all(c("mean", "sd", "q10"))
    , 
    iris %>% 
      select(Sepal.Width)%>%
      summarise_all(c("mean", "sd", "q10")) %>% 
      mutate(Species = "Total")
  )
smci
  • 32,567
  • 20
  • 113
  • 146
Nar
  • 648
  • 4
  • 8