1

I am trying to calculate grouped means using collapse package. Below is an example of what I am trying to achieve.

library(data.table)
library(collapse)

data_1 <- as.data.table(airquality)
var_means <- c(
  "Ozone",
  "Solar.R",
  "Wind"
)
data_1[,paste0(var_means,"_mean") := lapply(.SD,mean,na.rm = TRUE),by = .(Month)]
Maël
  • 45,206
  • 3
  • 29
  • 67
Vitalijs
  • 938
  • 7
  • 18
  • 1
    [Aggregate / summarize multiple variables per group (e.g. sum, mean)](https://stackoverflow.com/a/61248352/1851712), by the author of the `collapse` package. – Henrik Jul 29 '22 at 10:59
  • @Henrik it does not really answer my question because the original data then disappears. – Vitalijs Jul 29 '22 at 12:43

2 Answers2

3

There are at least a couple of ways. Using the dplyr-style syntax:

library(collapse)

var_means <- c(
  "Ozone",
  "Solar.R",
  "Wind"
)

airquality |>
  fgroup_by(Month) |>
  fmutate(across(var_means, fmean, .names = TRUE)) |>
  fungroup()

Or using ftransform():

ftransform(airquality,
           fmean(
             list(
               Ozone_mean = Ozone,
               Solar.R_mean = Solar.R,
               Wind_mean = Wind
             ),
             g = Month,
             TRA = 1
           ))   

Or if you want to pass a character vector of columns you need something like:

ftransform(airquality, 
           fmean(
             do.call(list, lapply(setNames(var_means, paste0(var_means, "_mean")), as.name)),
             g = Month,
             TRA = 1
           ))
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • can you please elaborate on the last one using character vector, looks a bit cryptic with setNames and str2lang – Vitalijs Jul 29 '22 at 16:49
1

A good answer you got by Ritchie. I would add that you can pass the function in a list to fmutate:

airquality |>
  fgroup_by(Month) |>
  fmutate(across(var_means, list(mean = fmean), .names = TRUE)) |>
  fungroup()

you could also use ftransform with compound pipes and the add_stubfunction:

library(magrittr)
airquality %>% ftransform(get_vars(., var_means) %>% fmean(Month, TRA = 1) %>% 
                          add_stub("_mean", pre = FALSE)) 

If you don't need to rename columns a simple approach would also be to use settransformv

settransformv(airquality, var_means, fmean, Month, TRA = 1, apply = FALSE)

comes very close to what you do with data.table. apply = FALSE here ensures we use fmean.data.frame applied to the whole subset of the frame, thus we only need to group once.

A final hybrid option you have is fcomputev with add_vars<- or ftransform<-, where the latter is more intelligent (i.e. it would replace columns if executed again) but the former is faster.

add_vars(airquality) <- airquality |> 
    fcomputev(var_means, fmean, Month, TRA = 1, apply = FALSE) |> 
    add_stub("_mean", pre = FALSE)
Sebastian
  • 1,067
  • 7
  • 12