2

I have a basic dataframe:

a = c(1,4,3,5)
b = c(3,6,3,11)

mydata = data.frame(a,b)

I would like to obtain the same dataframe (two columns a and b), but the basic statistics as lines.

Is there a dplyr command for this?

M--
  • 25,431
  • 8
  • 61
  • 93
Simon
  • 349
  • 2
  • 12
  • Would this suffice? `do.call(cbind, lapply(mydata, summary))` – Sotos Dec 03 '19 at 15:12
  • @Sotos isn't ```do.call(cbind, lapply(...)``` basically same as `sapply`? – M-- Dec 03 '19 at 15:23
  • @M-- Yes it is. Why I used `do.call(lapply...`? No idea :) – Sotos Dec 03 '19 at 15:28
  • Several solutions [here](https://stackoverflow.com/q/34594641/5325862) that would just require transposing. Several in base R [here](https://stackoverflow.com/q/20997380/5325862). – camille Dec 03 '19 at 16:48

2 Answers2

4

It may be better to have the data in 'long' format and then do the summary

library(dplyr)
library(tidyr)
mydata  %>%
      pivot_longer(everything()) %>%
      group_by(name) %>%
       summarise_at(vars(value), list(Min = min, Mean = mean, Max = max, Sd = sd))
# A tibble: 2 x 5
#  name    Min  Mean   Max    Sd
#  <chr> <dbl> <dbl> <dbl> <dbl>
#1 a         1  3.25     5  1.71
#2 b         3  5.75    11  3.77
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Reshaping is a good approach, but I think the OP wants it transposed back so a & b are the columns – camille Dec 03 '19 at 16:51
3

We can use sapply:

sapply(mydata, summary)

#>            a     b
#> Min.    1.00  3.00
#> 1st Qu. 2.50  3.00
#> Median  3.50  4.50
#> Mean    3.25  5.75
#> 3rd Qu. 4.25  7.25
#> Max.    5.00 11.00 

or if you don't want the quartiles:

sapply(mydata, function(x) list(Min = min(x), Mean = mean(x), 
                                Max = max(x), Sd = sd(x)))

A tidyverse solution would be possible using purrr::map:

library(purrr)

mydata %>% 
    map(~summary(.)) %>% 
    rbind.data.frame
M--
  • 25,431
  • 8
  • 61
  • 93