1

I've got a df with several variables, and and I want to make simultaneously summarized functions but differentiated according to the type of the variables.

The difficulty is that I want to use the variable type information from another metadata df and not with the usual tests (like "is.numeric" etc.).

Here, below is a reprex, I guess I should use a "match" inside the "where", and I don't even know if we can put two different across in the same summarise, can we?

Any idea on how to write two proper tests that work?

Thanks


# a df
df <- data.frame(ID = letters[1:15],
                 Group = sample(1:3, 15, replace = TRUE),
                 Var1 = sample.int(15),
                 Var2 = sample.int(15),
                 Var3 = sample.int(15),
                 Var4 = sample.int(15))

# another df with meta data on variables = type 

metaVar <- data.frame(Var = c("Var1", "Var2", "Var3", "Var4"),
                     Type = c(rep("stock", 2), rep("ratio", 2))) 

## summarise across different variables 
# using sum for "stock" type
# and mean for "ratio" type

groupDF <- df %>% 
  group_by(Group) %>%
  summarise(across(where(names(.) %in% metaVar[metaVar$Type == "stock", ]$Var), # not working
                   sum, na.rm = TRUE),
            across(where(names(.) %in% metaVar[metaVar$Type == "ratio", ]$Var), # not working
                   mean, na.rm = TRUE)) %>% # 
  ungroup

# Problem while evaluating `where(names(.) %in% metaVar[metaVar$Type == "stock", ]$Var)`

pgourdon
  • 139
  • 7

1 Answers1

2

You are complicating, there is no need for where nor for names(.) %in%.

suppressPackageStartupMessages({
  library(dplyr)
})

## summarise across different variables 
# using sum for "stock" type
# and mean for "ratio" type

groupDF <- df %>% 
  group_by(Group) %>%
  summarise(across(metaVar$Var[metaVar$Type == "stock"], \(x) sum(x, na.rm = TRUE)),
            across(metaVar$Var[metaVar$Type == "ratio"], \(x) mean(x, na.rm = TRUE))) %>% # 
  ungroup()

groupDF
#> # A tibble: 3 × 5
#>   Group  Var1  Var2  Var3  Var4
#>   <int> <int> <int> <dbl> <dbl>
#> 1     1    23    13  6.67  6   
#> 2     2    47    69  8.5   9.67
#> 3     3    50    38  8.17  7.33

Created on 2023-03-22 with reprex v2.0.2


Note

I have used anonymous functions since

#> Warning: There was 1 warning in `summarise()`.
#> ℹ In argument: `across(metaVar$Var[metaVar$Type == "stock"], sum, na.rm =
#>   TRUE)`.
#> ℹ In group 1: `Group = 1`.
#> Caused by warning:
#> ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
#> Supply arguments directly to `.fns` through an anonymous function instead.
#> 
#>   # Previously
#>   across(a:b, mean, na.rm = TRUE)
#> 
#>   # Now
#>   across(a:b, \(x) mean(x, na.rm = TRUE))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66