R question: how to save summary results into a dataset

Question

I'm a SAS programmer trying to learn R. If SAS, I would do this to save results of descriptive stats into a dataset:

proc means data=abc;
var var1 var2 var3;
ods output summary=result1;
run;

I think in R, it would be this: summary(abc)->result1

Someone told me to do this. as.data.frame(unclass(summary(new_scales)))->new_table

But the result in this table is not very usable.

Is there away to get a better structured result like I would get from SAS PROC MEANS? I would like columns to look like: variable name, Mean, SD, min, max, etc. and columns carry results from each variable.

Parfait · Answer 1 · 2019-07-16T19:58:53.857

2

Consider sapply (hidden loop to return equal length object as input) to create a matrix of aggregation results:

# SINGLE AGGREGATE
stats_vector <- sapply(abc[c("var1", "var2", "var3")], function(x) mean(x, na.rm=TRUE)))

# MULTIPLE AGGREGATES
stats_matrix <- sapply(abc[c("var1", "var2", "var3")], 
    function(x) c(count=length(x), sum=sum(x), mean=mean(x), min=min(x), 
                  q1=quantile(x)[2], median=median(x), q3=quantile(x)[4], 
                  max=max(x), sd=sd(x)))
)

If your proc means uses class for grouping, then use aggregate which returns a data frame:

# SINGLE AGGREGATE
mean_df <- aggregate(cbind(var1, var2, var3) ~ group, abc, function(x) mean(x, na.rm=TRUE)))

# MULTIPLE AGGREGATES
agg_raw <- aggregate(cbind(var1, var2, var3) ~ group, abc, 
    function(x) c(count=length(x), sum=sum(x), mean=mean(x), min=min(x), 
                  q1=quantile(x)[2], median=median(x), q3=quantile(x)[4], 
                  max=max(x), sd=sd(x)))
)

agg_df <- do.call(data.frame, agg_raw)

Rextester demo

edited Jul 16 '19 at 19:58

answered Jul 14 '19 at 04:15

Parfait

104,375
17
94
125

I tried this: # SINGLE AGGREGATE sapply(abc[c("GrowthMindset", "SelfEfficacy", "MSelfEfficacy","MathAnxiety","TeacherUse")], mean) and I got this: GrowthMindset SelfEfficacy MSelfEfficacy MathAnxiety TeacherUse NA NA NA NA NA I am not sure why this happened. The data does have missing values. – Kaz Jul 16 '19 at 05:30
See update using the `na.rm=TRUE` argument of `mean`. Do same for multiple aggregates. – Parfait Jul 16 '19 at 13:31
Thanks! I'm almost there with the aggregate statement for multiple aggregates. 1) how do I get n? n(x) didn't work. 2) if I want to save the result of this into a dataset, will I do "-> newdata"? It is not letting me save only means (but not other statistics like sd's). – Kaz Jul 16 '19 at 18:31
See update adding assignments to each with a special handling of multiple funcs for `aggregate`. And use `length` function for counts. – Parfait Jul 16 '19 at 19:59

score 0 · Answer 2 · answered Jul 14 '19 at 04:32

Consider the tidyverse approach. The idea is to pass the data into an equation like linear regression, then map the model result to model values & finally storing the summary into a data frame.

library(tidyverse)
library(broom)
summary_result<-mtcars %>%
  nest(-carb) %>%
  mutate(model = purrr::map(data, function(x) {
    lm(gear ~ mpg+cyl, data = x)}),
    values = purrr::map(model, glance),
    r.squared = purrr::map_dbl(values, "r.squared"),
    pvalue = purrr::map_dbl(values, "p.value")) %>%
  select(-data, -model, -values)

summary_result

  carb r.squared   pvalue
1    4    0.4352 0.135445
2    1    0.7011 0.089325
3    2    0.8060 0.003218
4    3    0.5017 0.498921
5    6    0.0000       NA
6    8    0.0000       NA

I will try this approach soon and let you know what it went. Thanks. — Kaz, Jul 16 '19 at 21:33

R question: how to save summary results into a dataset

2 Answers2

Linked