Extract multiple descriptive statistics

Question

Having a dataframe like this:

   df <-  structure(list(id = c(43, 11, 24, 12), a = c(0.291435739245075, 
    0.309022489024281, 0.342122441665493, 0.302379459085847), b = c(0.200071678165039, 
    0.190343927195464, 0.279532043979674, 0.273976986189153), c = c(0.821534168725281, 
    0.789752582333892, 0.650428149039385, 0.787013452455617), d = c(0.173486758738976, 
    0.176046693204654, 0.242694587018572, 0.233888412456641), e = c(0.435969639177237, 
    0.435739245075326, 0.440206330717933, 0.481318878236717), a1 = c(0.292370115325048, 
    0.306001766354781, 0.292792504511884, 0.301265887593278), a2 = c(0.202618812958388, 
    0.212948148527398, 0.271737043531686, 0.215482483648419), a3 = c(0.796331613910684, 
    0.765138812446401, 0.782341572055755, 0.800798699553291), a4 = c(0.176161890255609, 
    0.202567614269075, 0.198510118140976, 0.191623894428303), a5 = c(0.431822545342839, 
    0.458753055921768, 0.47073354922114, 0.424501132771001)), row.names = c(NA, 
    4L), class = "data.frame")

How is it possible to extract this descriptive statistics for every column:

5% Mean Median 95% SD

`apply( df, 2, summary )` ? repeat with `quantile`, `mean` and/or `sd` as your function.. — Wimpel, Feb 12 '20 at 13:41
R has all those functions with `quantile` for percentiles. Give it a try and come back with a specific question. — Parfait, Feb 12 '20 at 13:42
Before you post questions, *please* look around SO for posts that already deal with your issue. The SO search isn't great; using Google with "site:stackoverflow.com" works well. For this question, there's also [this](https://stackoverflow.com/q/34594641/5325862), [this](https://stackoverflow.com/q/50432751/5325862), [this](https://stackoverflow.com/q/57068164/5325862), and [this](https://stackoverflow.com/q/37005445/5325862) that came up pretty quickly. Also post what you've tried so far so folks aren't doing your project for you from scratch — camille, Feb 12 '20 at 15:13

UseR10085 · Answer 1 · 2020-02-12T14:32:13.133

You can use the following code to accomplish that

library(dplyr)
library(moments)
library(reshape2)

df.m <- melt(df, id="id")

df.m %>% 
  group_by(variable) %>% 
  summarize(min = min(value), max = max(value), 
            mean = mean(value), q5= quantile(value, probs = 0.05), 
            median = median(value), q95= quantile(value, probs = 0.75),
            sd = sd(value), skewness=skewness(value), kurtosis=kurtosis(value))

tidyverse solution where melt() function from reshape2 package is not required

library(tidyverse)

df %>% pivot_longer(-id) %>% 
  group_by(name) %>% 
  summarize(min = min(value), max = max(value), 
            mean = mean(value), q5= quantile(value, probs = 0.05), 
            median = median(value), q95= quantile(value, probs = 0.75),
            sd = sd(value), skewness=skewness(value), kurtosis=kurtosis(value))

score 1 · Answer 2 · answered Feb 12 '20 at 14:39

You can define your custom function f like below

f <- Vectorize(function(v) {
  data.frame(Qu5=quantile(v,0.05),
             Mean = mean(v),
             Median = median(v),
             Qu95 = quantile(v,0.95),
             SD = sd(v))
})

and then apply df_stat <- f(df[-1]), such that

> df_stat
       a          b          c          d          e          a1          a2         a3       
Qu5    0.2930773  0.1918031  0.6709159  0.1738707  0.4357738  0.2924335   0.2041682  0.7677192
Mean   0.31124    0.2359812  0.7621821  0.2065291  0.4483085  0.2981076   0.2256966  0.7861527
Median 0.305701   0.2370243  0.788383   0.2049676  0.438088   0.2970292   0.2142153  0.7893366
Qu95   0.3371574  0.2786988  0.8167669  0.2413737  0.475152   0.3052914   0.2632989  0.8001286
SD     0.02182781 0.04730261 0.07613221 0.03686664 0.02210252 0.006669873 0.03119379 0.0160647
       a4        a5        
Qu5    0.1784812 0.4255993 
Mean   0.1922159 0.4464526 
Median 0.195067  0.4452878 
Qu95   0.201959  0.4689365 
SD     0.0116169 0.02188433

Extract multiple descriptive statistics

2 Answers2