0

I have this code

data_2012 %>%
  group_by(job2) %>%
  filter(!is.na(job2)) %>%
  summarise(mean = mean(persinc2, na.rm = T),
            sd = sd(persinc2, na.rm = T))

Which gives me a little table for that specific variable which is perfect, however i have multiple variables that i want the mean and SD for but it all to be in the one table, how do i do that?

I am very new to R.

Maël
  • 45,206
  • 3
  • 29
  • 67
  • You may want to use a package. There are many functions that can summarize a whole dataset in one line. `dplyr::glimpse()` is one. `qwraps2` might also be helpful: https://cran.r-project.org/web/packages/qwraps2/vignettes/summary-statistics.html. And there's a nice article here: https://thatdatatho.com/easily-create-descriptive-summary-statistic-tables-r-studio/ – dash2 Jan 11 '22 at 16:53
  • Hi! Does the reply answer your question? – Maël Jan 13 '22 at 12:11

2 Answers2

0

You can use across and have to choose your columns using the tidy_select format:

data_2012 %>%
  group_by(job2) %>%
  filter(!is.na(job2)) %>%
  summarise(across(your_columns, list(mean = ~ mean(.x, na.rm = TRUE), 
                                              sd = ~ sd(.x, na.rm = TRUE))))

With a toy dataset

iris %>% 
  group_by(Species) %>% 
  summarise(across(everything(), list(mean = ~ mean(.x, na.rm = TRUE), 
                   sd = ~ sd(.x, na.rm = TRUE))))
# A tibble: 3 x 9
  Species    Sepal.Length_mean Sepal.Length_sd Sepal.Width_mean Sepal.Width_sd
  <fct>                  <dbl>           <dbl>            <dbl>          <dbl>
1 setosa                  5.01           0.352             3.43          0.379
2 versicolor              5.94           0.516             2.77          0.314
3 virginica               6.59           0.636             2.97          0.322
# ... with 4 more variables: Petal.Length_mean <dbl>, Petal.Length_sd <dbl>,
#   Petal.Width_mean <dbl>, Petal.Width_sd <dbl>
Maël
  • 45,206
  • 3
  • 29
  • 67
0

With base R, we may use split() to split the data by some factor variable. This returns a list of a number of elements that is equal to the number of levels of that factor variable. We can then obtain the mean and sd (or any other statistic you like) per column per level using members of the *apply() family as follows:

# toy data
df <- mtcars[, 1:5]

# splitting by a factor variable
lapply(split(df, df$cyl), function(x) {
  sapply(x, function(i) data.frame(Mean=mean(i), SD=sd(i)))
})

Output

$`4`
     mpg      cyl disp     hp       drat     
Mean 26.66364 4   105.1364 82.63636 4.070909 
SD   4.509828 0   26.87159 20.93453 0.3654711

$`6`
     mpg      cyl disp     hp       drat     
Mean 19.74286 6   183.3143 122.2857 3.585714 
SD   1.453567 0   41.56246 24.26049 0.4760552

$`8`
     mpg      cyl disp     hp       drat     
Mean 15.1     8   353.1    209.2143 3.229286 
SD   2.560048 0   67.77132 50.97689 0.3723618
Dion Groothof
  • 1,406
  • 5
  • 15