I do not have a pressing use case but would like to understand how tidy eval and data.table may work together.
I have working alternative solutions so I am mostly interested in the why because I hope to have a better understanding of tidy eval in general which would help me in a wide variety of use cases.
How to make data.table + tidy eval work with group by?
In the following examples I used the development version of rlang.
update
I updated my original question based on Stefan F's answer and my further explorations: I no longer think the inserted ~ is a significant part of the question as it is present in the dplyr code as well, but I have a specific code: data.table + group by + quo which I d not understand why does not work.
# setup ------------------------------------
suppressPackageStartupMessages(library("data.table"))
suppressPackageStartupMessages(library("rlang"))
suppressPackageStartupMessages(library("dplyr"))
#> Warning: package 'dplyr' was built under R version 3.5.1
dt <- data.table(
num_campaign = 1:5,
id = c(1, 1, 2, 2, 2)
)
df <- as.data.frame(dt)
# original question ------------------------
aggr_expr <- quo(sum(num_campaign))
q <- quo(dt[, aggr := !!aggr_expr][])
e <- quo_get_expr(q)
e
#> dt[, `:=`(aggr, ~sum(num_campaign))][]
dt[, `:=`(aggr, ~sum(num_campaign))][]
#> Error in `[.data.table`(dt, , `:=`(aggr, ~sum(num_campaign))): RHS of assignment is not NULL, not an an atomic vector (see ?is.atomic) and not a list column.
eval_tidy(e, data = dt)
#> num_campaign id aggr
#> 1: 1 1 15
#> 2: 2 1 15
#> 3: 3 2 15
#> 4: 4 2 15
#> 5: 5 2 15
using expression instead of quo is not good in this case as variables in the user-supplied expression might not be evaluated in the good environment:
# updated question --------------------------------------------------------
aggr_dt_expr <- function(dt, aggr_rule) {
aggr_expr <- enexpr(aggr_rule)
x <- 2L
q <- quo(dt[, aggr := !!aggr_expr][])
eval_tidy(q, data = dt)
}
x <- 1L
# expression is evaluated with x = 2
aggr_dt_expr(dt, sum(num_campaign) + x)
#> num_campaign id aggr
#> 1: 1 1 17
#> 2: 2 1 17
#> 3: 3 2 17
#> 4: 4 2 17
#> 5: 5 2 17
aggr_dt_quo <- function(dt, aggr_rule) {
aggr_quo <- enquo(aggr_rule)
x <- 2L
q <- quo(dt[, aggr := !!aggr_quo][])
eval_tidy(q, data = dt)
}
x <- 1L
# expression is evaluated with x = 1
aggr_dt_quo(dt, sum(num_campaign) + x)
#> num_campaign id aggr
#> 1: 1 1 16
#> 2: 2 1 16
#> 3: 3 2 16
#> 4: 4 2 16
#> 5: 5 2 16
I have an explicit problem using group by:
# using group by --------------------------------
grouped_aggr_dt_expr <- function(dt, aggr_rule) {
aggr_quo <- enexpr(aggr_rule)
x <- 2L
q <- quo(dt[, aggr := !!aggr_quo, by = id][])
eval_tidy(q, data = dt)
}
# group by has effect but x = 2 is used
grouped_aggr_dt_expr(dt, sum(num_campaign) + x)
#> num_campaign id aggr
#> 1: 1 1 5
#> 2: 2 1 5
#> 3: 3 2 14
#> 4: 4 2 14
#> 5: 5 2 14
grouped_aggr_dt_quo <- function(dt, aggr_rule) {
aggr_quo <- enquo(aggr_rule)
x <- 2L
q <- quo(dt[, aggr := !!aggr_quo, by = id][])
eval_tidy(q, data = dt)
}
# group by has no effect
grouped_aggr_dt_quo(dt, sum(num_campaign) + x)
#> num_campaign id aggr
#> 1: 1 1 16
#> 2: 2 1 16
#> 3: 3 2 16
#> 4: 4 2 16
#> 5: 5 2 16
# using dplyr works fine ------------------------------------------------------------
grouped_aggr_df_quo <- function(df, aggr_rule) {
aggr_quo <- enquo(aggr_rule)
x <- 2L
q <- quo(mutate(group_by(df, id), !!aggr_quo))
eval_tidy(q)
}
grouped_aggr_df_quo(df, sum(num_campaign) + x)
#> # A tibble: 5 x 3
#> # Groups: id [2]
#> num_campaign id `sum(num_campaign) + x`
#> <int> <dbl> <int>
#> 1 1 1 4
#> 2 2 1 4
#> 3 3 2 13
#> 4 4 2 13
#> 5 5 2 13
I understand extracting expressions from quosures is not the way to work with tidy eval but I hoped to use it as a debugging tool: (not much luck so far)
# returning expression in quo for debugging --------------
grouped_aggr_dt_quo_debug <- function(dt, aggr_rule) {
aggr_quo <- enquo(aggr_rule)
x <- 2L
q <- quo(dt[, aggr := !!aggr_quo, by = id][])
quo_get_expr(q)
}
grouped_aggr_dt_quo_debug(dt, sum(num_campaign) + x)
#> dt[, `:=`(aggr, ~sum(num_campaign) + x), by = id][]
grouped_aggr_df_quo_debug <- function(df, aggr_rule) {
aggr_quo <- enquo(aggr_rule)
x <- 2L
q <- quo(mutate(group_by(df, id), !!aggr_quo))
quo_get_expr(q)
}
# ~ is inserted in this case as well so it is not the problem
grouped_aggr_df_quo_debug(df, sum(num_campaign) + x)
#> mutate(group_by(df, id), ~sum(num_campaign) + x)
Created on 2018-08-12 by the reprex package (v0.2.0).
Original wording of the question:
Why is a ~ inserted and why isn't it a problem with tidy eval if it is a problem with base eval and everything is in the global environment?
This example is derived from a more realistic but also more complicated use case where I got unexpected results.