0

I have a few data frames.

I need to display basic statistics together with interquartile range (IQR) in one table for all of them.

Unfortunately, summary function does not return IQR. On the other hand, fivenum returns IQR, but cannot (?) be applied on list of data frames and I don't need median.

Since I was unable to find appropriate function, I wrote one myself as follows:

removeXYcol <- function(df)
{
  # removes coordinates
  drops <- c("X","Y")
  withoutXY<- df[,!(names(df) %in% drops)]
  withoutXY
}

getStatsTable <- function(listOfDataFrames, df_names = NULL, digits_no = 2)
{
  # returns table with statistics (together with IQR which is not returned by summary)
  for (df in listOfDataFrames){
    df_data <- unlist(removeXYcol(df))

    minimum <- round(min(df_data,na.rm = TRUE),digits = digits_no)
    maximum <- round(max(df_data, na.rm = TRUE),digits = digits_no)
    average <- round(mean(df_data, na.rm = TRUE),digits = digits_no)
    IQR_ <- round(IQR(df_data, na.rm = TRUE),digits = digits_no)

    toReturn <- c(minimum, maximum, average, IQR_)
    if (exists("myStats")) {
      myStats <- rbind(myStats, toReturn)
    } else {
      myStats <- toReturn
    }

  }
  colnames(myStats) <- c("minimum", "maximum", "average", "IQR")
  if (is.null(df_names)) {
    df_names <- seq(length(listOfDataFrames))
  }
  rownames(myStats) <- df_names
  return(myStats)
}

However I wonder if there's no simpler solution.

alistaire
  • 42,459
  • 4
  • 77
  • 117
matandked
  • 1,527
  • 4
  • 26
  • 51

1 Answers1

1

fivenum takes a vector, so you've got to lapply across the list, plus lapply across the data.frame, e.g.

lapply(list(mtcars, mtcars), function(df){lapply(df, fivenum)})

## [[1]]
## [[1]]$mpg
## [1] 10.40 15.35 19.20 22.80 33.90
## 
## [[1]]$cyl
## [1] 4 4 6 8 8
## ...

An alternative is purrr::at_depth, which allows you to specify the level of the list you want to iterate over:

purrr::at_depth(list(mtcars, mtcars), 2, fivenum)

## [[1]]
## [[1]]$mpg
## [1] 10.40 15.35 19.20 22.80 33.90
## 
## [[1]]$cyl
## [1] 4 4 6 8 8
## ...

A more involved version that returns a more nicely formatted list of data.frames:

library(tidyverse)

list(mtcars, mtcars) %>% 
    map(gather, var, val) %>% 
    map(group_by, var) %>% 
    map(summarise, 
        val = list(fivenum(val)), 
        label = list(c('min', 'q1', 'med', 'q3', 'max'))) %>% 
    map(unnest) %>% 
    map(spread, label, val)

## [[1]]
## # A tibble: 11 × 6
##      var     max     med    min       q1     q3
## *  <chr>   <dbl>   <dbl>  <dbl>    <dbl>  <dbl>
## 1     am   1.000   0.000  0.000   0.0000   1.00
## 2   carb   8.000   2.000  1.000   2.0000   4.00
## 3    cyl   8.000   6.000  4.000   4.0000   8.00
## ...
alistaire
  • 42,459
  • 4
  • 77
  • 117
  • 1
    First version works for me, but this is not a nice table (it cannot be passed to xtable). I have problem with 'tidyverse' installation, but I believe that it works. – matandked Feb 12 '17 at 16:16