-1

I'm a SAS programmer trying to learn R. If SAS, I would do this to save results of descriptive stats into a dataset:

proc means data=abc;
var var1 var2 var3;
ods output summary=result1;
run;

I think in R, it would be this: summary(abc)->result1

Someone told me to do this. as.data.frame(unclass(summary(new_scales)))->new_table

But the result in this table is not very usable.

Is there away to get a better structured result like I would get from SAS PROC MEANS? I would like columns to look like: variable name, Mean, SD, min, max, etc. and columns carry results from each variable.

Kaz
  • 37
  • 6

2 Answers2

2

Consider sapply (hidden loop to return equal length object as input) to create a matrix of aggregation results:

# SINGLE AGGREGATE
stats_vector <- sapply(abc[c("var1", "var2", "var3")], function(x) mean(x, na.rm=TRUE)))

# MULTIPLE AGGREGATES
stats_matrix <- sapply(abc[c("var1", "var2", "var3")], 
    function(x) c(count=length(x), sum=sum(x), mean=mean(x), min=min(x), 
                  q1=quantile(x)[2], median=median(x), q3=quantile(x)[4], 
                  max=max(x), sd=sd(x)))
)

If your proc means uses class for grouping, then use aggregate which returns a data frame:

# SINGLE AGGREGATE
mean_df <- aggregate(cbind(var1, var2, var3) ~ group, abc, function(x) mean(x, na.rm=TRUE)))

# MULTIPLE AGGREGATES
agg_raw <- aggregate(cbind(var1, var2, var3) ~ group, abc, 
    function(x) c(count=length(x), sum=sum(x), mean=mean(x), min=min(x), 
                  q1=quantile(x)[2], median=median(x), q3=quantile(x)[4], 
                  max=max(x), sd=sd(x)))
)

agg_df <- do.call(data.frame, agg_raw)

Rextester demo

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I tried this: # SINGLE AGGREGATE sapply(abc[c("GrowthMindset", "SelfEfficacy", "MSelfEfficacy","MathAnxiety","TeacherUse")], mean) and I got this: GrowthMindset SelfEfficacy MSelfEfficacy MathAnxiety TeacherUse NA NA NA NA NA I am not sure why this happened. The data does have missing values. – Kaz Jul 16 '19 at 05:30
  • See update using the `na.rm=TRUE` argument of `mean`. Do same for multiple aggregates. – Parfait Jul 16 '19 at 13:31
  • Thanks! I'm almost there with the aggregate statement for multiple aggregates. 1) how do I get n? n(x) didn't work. 2) if I want to save the result of this into a dataset, will I do "-> newdata"? It is not letting me save only means (but not other statistics like sd's). – Kaz Jul 16 '19 at 18:31
  • See update adding assignments to each with a special handling of multiple funcs for `aggregate`. And use `length` function for counts. – Parfait Jul 16 '19 at 19:59
0

Consider the tidyverse approach. The idea is to pass the data into an equation like linear regression, then map the model result to model values & finally storing the summary into a data frame.

library(tidyverse)
library(broom)
summary_result<-mtcars %>%
  nest(-carb) %>%
  mutate(model = purrr::map(data, function(x) {
    lm(gear ~ mpg+cyl, data = x)}),
    values = purrr::map(model, glance),
    r.squared = purrr::map_dbl(values, "r.squared"),
    pvalue = purrr::map_dbl(values, "p.value")) %>%
  select(-data, -model, -values)

summary_result

  carb r.squared   pvalue
1    4    0.4352 0.135445
2    1    0.7011 0.089325
3    2    0.8060 0.003218
4    3    0.5017 0.498921
5    6    0.0000       NA
6    8    0.0000       NA
mnm
  • 1,962
  • 4
  • 19
  • 46