0

I have a dataset and I need to create a simple table with the number of observations, means, and standard deviations of all the variables (columns). I can't find a way to get only the required 3 summary statistics. Everything I tried keeps giving me min, max, median, 1st and 3rd quartiles, etc. The table should look something like this (with a title):

Table 1: Table Title
_______________________________________
Variables   Observations  Mean  Std.Dev
_______________________________________
Age            30          24      2
...             .          .       .
...             .          .       .
_______________________________________

The summary () does not work because it gives too many other summary statistics. I have done this:

sapply(dataset, function(x) list(means=mean(x,na.rm=TRUE), sds=sd(x,na.rm=TRUE)))

But how do I form the table from this? And is there a better way to do this than using "sapply"?

Pineapple
  • 193
  • 8

1 Answers1

2

sapply does return the values that you want but it is not properly structured.

Using mtcars data as an example :

#Get the required statistics and convert the data into dataframe
summ_data <- data.frame(t(sapply(mtcars, function(x) 
                     list(means = mean(x,na.rm=TRUE), sds = sd(x,na.rm=TRUE)))))
#Change rownames to new column
summ_data$variables <- rownames(summ_data)
#Remove rownames
rownames(summ_data) <- NULL
#Make variable column as 1st column
cbind(summ_data[ncol(summ_data)], summ_data[-ncol(summ_data)])

Another way would be using dplyr functions :

library(dplyr)

mtcars %>%
  summarise(across(.fns = list(means = mean, sds = sd), 
                   .names = '{col}_{fn}')) %>%
  tidyr::pivot_longer(cols = everything(), 
               names_to = c('variable', '.value'), 
               names_sep = '_')

# A tibble: 11 x 3
#   variable   means     sds
#   <chr>      <dbl>   <dbl>
# 1 mpg       20.1     6.03 
# 2 cyl        6.19    1.79 
# 3 disp     231.    124.   
# 4 hp       147.     68.6  
# 5 drat       3.60    0.535
# 6 wt         3.22    0.978
# 7 qsec      17.8     1.79 
# 8 vs         0.438   0.504
# 9 am         0.406   0.499
#10 gear       3.69    0.738
#11 carb       2.81    1.62 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Ronak, thank you! The first part worked. But the second method with dplyr does not work. Returns error: ```Error in across(.fns = list(means = mean, sds = sd), .names = "{col}_{fn}") : could not find function "across"``` Any ideas? – Pineapple Oct 02 '20 at 07:46
  • `across` is from latest version of `dplyr`. you should have dplyr 1.0.0 or higher installed for that. Check `packageVersion('dplyr')` – Ronak Shah Oct 02 '20 at 07:47
  • Yesssss!! It worked. Thank you! Did not have the updated version. Btw, do you have any pointers on where I can look to learn about producing tables and getting them directly in a word file? I can do this with Stata, but I am new to R. – Pineapple Oct 02 '20 at 08:09
  • 1
    There is `officer` package that can help to do that. Also I found this post which might be helpful for you https://stackoverflow.com/questions/25425993/data-frame-to-word-table – Ronak Shah Oct 02 '20 at 08:23