1

I have a dataframe with a numeric variable ("numeric") and several factor variables (factors 0 and 1 (FALSE, TRUE) or 0 to 4 (states in a pathology)). I would like to summarise median and IQR for "numeric" for each of the groups (0 to 1, 0 to 4).

Would there a way to apply this function to every factor column in the dataset without having to type one variable by one?

`library(dplyr)
 group_by(df, othervariable) %>%
  summarise(
  count = n(),
  median = median(numeric, na.rm = TRUE),
  IQR = IQR(numeric, na.rm = TRUE)
)`

The output:

othevariable count median   IQR
      <dbl> <int>  <dbl> <dbl>
1       0   100   2.46  2.65
2       1   207   3.88  5.86    
Ric S
  • 9,073
  • 3
  • 25
  • 51

1 Answers1

0

If your dataset contains only the grouping variables of interest and numeric, you can use purrr's function map to apply the summarise statement to each group.

library(dplyr)

purrr::map(names(df %>% select(-numeric)), function(i) {
  df %>% 
    group_by(!!sym(i)) %>% 
    summarize(
      count = n(),
      median = median(numeric, na.rm = TRUE),
      IQR = IQR(numeric, na.rm = TRUE)
    )
})

The output should be a list of dataframes, each element corresponding to a grouping variable along with its summary result.

Ric S
  • 9,073
  • 3
  • 25
  • 51
  • Thanks for the info about `across`. But this code output takes every value in the numeric rows as a group. What I want to calculate is the median of "numeric" for group 0 and group 1 in variable 1, group 0 and group 1 in variable 2, etc... – dracoplasma Sep 30 '20 at 15:41
  • I'm sorry but I don't think I understand what you would like to achieve.. Could you paste the output of the function `dput(df)` in your question so that I have a sample of your dataset? – Ric S Sep 30 '20 at 15:56
  • I'm sorry I couldn´t explain myself accurately. It's such a big dataset and contains personal data, sorry. I have a continuous numeric variable "gene expression" and 77 columns of categoric variables with 2 factors (0 and 1, like "it doesn't present this clinical feature" and "it does") in around 300 patients (rows). I want to have the median and IQR of the gene expression in each of the groups (0 and 1) for each column. I can do it with spss, but I would prefer a more efficient way to export this data into a pdf or xls directly, rather than dealing with the output format of SPSS. – dracoplasma Sep 30 '20 at 16:16
  • @dracoplasma With this explanation I believe you made it clearer, thanks. I edited my answer, please check if that is what you would like to achieve. – Ric S Oct 01 '20 at 07:14
  • Worked perfectly! Thanks! Just need to knit it to a neat pdf with R Markdown, I'll search out. – dracoplasma Oct 02 '20 at 06:47
  • @dracoplasma Nice! If you found my answer useful, please consider to upvote and accept it, as you should do on Stack Overflow posts :) – Ric S Oct 02 '20 at 06:54