-1

I have a dataframe with one date column and several columns with values (measured concentrations). I am mutating the dataframe and summarizing to years with averages of the values. This works fine:

library(dplyr)    

df <- data.frame(mydate=as.Date(c("2000-01-15", "2000-02-15", "2000-03-15")), columnA=c(2, 4, 5), columnB=c(3, 6, 7))
df_year <- df %>%
        mutate(year = format(mydate, format="%Y")) %>%
        group_by(year) %>%
        summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))

The above code gives me averages. Is it possible to use ifelse and have df_year summarized with n-percentile only in columns with a name contains "B"? That is, columnA will still summarize into average, but columnB will summarize into a percentile.

I know how to compute quantiles, but I'm not able to use ifelse in an efficient way. I don't want to create a new dataframe since it contains multiple columns that are later looped through when plotting. I am using grepl to catch the "B", but get error 'unused argument'. I'm looking for something like:

mutate...%>%
group_by...%>%
ifelse(grepl("B", each_column_name, fixed=TRUE)==TRUE,
summarise(across(where(is.numeric), ~ quantile(.x, probs=0.9, na.rm = TRUE))),
summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))))

2 Answers2

2

The first argument of across can combine multiple selection methods, including adding matches("B"):

df %>%
  mutate(year = format(mydate, format="%Y")) %>%
  group_by(year) %>%
  summarise(across(matches("B") & where(is.numeric), ~ mean(.x, na.rm = TRUE)))
# # A tibble: 1 × 2
#   year  columnB
#   <chr>   <dbl>
# 1 2000     5.33
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thanks for your suggestion, which shows how to use multiple conditions in the summarize function! However, it doesn’t solve my problem with multiple columns, where some should be grouped with average and others grouped with percentile. I would like to have two different summarize lines for the same mutate: one with mean and one with quantile. – Martin Liungman Mar 13 '23 at 22:58
0

Maybe r2evans was correct and I didn’t get it. Anyways, I found a good description on how easy this is at https://www.tidyverse.org/blog/2020/04/dplyr-1-0-0-colwise/ Apparently, summarise does this automatically if I just add new across functions after each other, separated by commas. Example:

Summarise(
    across(matches("A") & where(is.numeric), ~ mean(.x, na.rm = TRUE)),
    across(matches("B") & where(is.numeric), ~ median(.x, na.rm = TRUE))
)
  • Please update your code so the result it is reproducible, it is more helpful to others if generalised code is avoided. – L Tyrone Mar 26 '23 at 19:41