I have a data frame which encapsulates a number of statistics for an exam, tracked over different years and groups. I would like to construct a function which adds new columns giving the change in these statistics for each group from a dynamically supplies list of reference years.
Here is an example of the output I would like.
grades <- data.frame(
Group = c(rep("A", 4), rep("B", 4)),
Year = rep(seq(2015, 2018), 2),
Mean = c(seq(100, 130, 10), seq(200, 260, 20)),
PassR = c(seq(0.5, 0.53, 0.01), seq(0.6, 0.66, 0.02))
)
grades |> group_by(Group) |> calculateDifferences(c(2015, 2016))
# A tibble: 8 × 8
# Groups: Group [2]
Group Year Mean PassR Mean_Diff2015 Mean_Diff2016 PassR_Diff2015 PassR_Diff2016
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 2015 100 0.5 0 -10 0 -0.0100
2 A 2016 110 0.51 10 0 0.0100 0
3 A 2017 120 0.52 20 10 0.0200 0.0100
4 A 2018 130 0.53 30 20 0.0300 0.0200
5 B 2015 200 0.6 0 -20 0 -0.0200
6 B 2016 220 0.62 20 0 0.0200 0
7 B 2017 240 0.64 40 20 0.0400 0.0200
8 B 2018 260 0.66 60 40 0.0600 0.0400
My best attempt is the following function, but it runs into problems of scoping the Year column within the list.
# Calculate differences from the given year for both mean and pass rate
calculateDifferences <- function(data, diffYears) {
mutate(data,
across(
any_of(c("Mean", "PassR")),
#list(Diff2015 = function(col) col - col[Year == 2015],
# Diff2016 = function(col) col - col[Year == 2016]),
map(as.list(diffYears), function(year) { function(col) col - col[Year == year] }) |>
set_names(str_c("Diff", diffYears)),
.names = "{.col}_{.fn}"
)
)
}
Running this code complains that it cannot find the object Year
. I've tried introducing some NSE to delay evaluation of the variable, but neither !!substitute("Year")
nor !!quo("Year")
produces the desired output: it merely throws as a dplyr::mutate_incompatible_size <named_list>
error. Trying to replace it with .data[["Year"]]
complains that it's not in a data masking context.
If I hard-code the years (as in the commented section of the function) this runs correctly and produces the desired output, but it cannot adapt to a dynamically supplied list of years.
I can try to separately pull the Year column with data[["Year"]]
. This works well if the data is ungrouped, but falls apart if the data is grouped.