How can I ensure that my nonstandard evaluation1 using data.table
is inheriting the variables it needs from the parent frame?
Based on my understanding of dynamic scope, my code below should work, but it doesn't. What am I doing wrong?
Details
I have a list of many functions I want to apply to a single data.table
that return boolean checks and messages (for when the check is TRUE). For example, let's say I am auditing a table of accounts.
library(data.table)
#----- Example data -----------------------------------------------------------
n <- 100
set.seed(123)
df <- data.table( acct_id = paste0('ID',seq(n)),
acct_balance = round(pmax(rnorm(n,1000,5000),0)),
days_overdue = round(pmax(rnorm(n,20,20),0))
)
#----- Example list of rules to check (real case has more elements)------------
AuditRules <- list(
list(
msg_id = 1,
msg_cat = 'Balance',
cond_fn = function(d) d[, acct_balance > balance_limit ],
msg_txt =
function(d) d[, paste('Account',acct_id,'balance is',
acct_balance - balance_limit,
'over the limit.')]
),
list(
msg_id = 2,
msg_cat = 'Overdue',
cond_fn = function(d) d[, days_overdue > grace_period ],
msg_txt =
function(d) d[, paste('Account',acct_id,'is overdue',
days_overdue-grace_period,
'days beyond grace period.')]
)
)
I am looping through the list of rules and checking the dataset on each.
Desired Output
This works fine in the global environment.
balance_limit <- 1e4
grace_period <- 14
audit <- rbindlist(
lapply(AuditRules, function(item){
with( item,
df[ cond_fn(df),
.(msg_id,
msg_cat,
msg_txt = msg_txt(.SD) )
]
)
} )
)
print(head(audit), row.names=FALSE)
#----------------- Result --------------------------------------
# msg_id msg_cat msg_txt
# 1 Balance Account ID44 balance is 1845 over the limit.
# 1 Balance Account ID70 balance is 1250 over the limit.
# 1 Balance Account ID97 balance is 1937 over the limit.
# 2 Overdue Account ID2 is overdue 11 days beyond grace period.
# 2 Overdue Account ID3 is overdue 1 days beyond grace period.
# 2 Overdue Account ID6 is overdue 5 days beyond grace period.
What doesn't work (and needs a solution)
rm(balance_limit, grace_period) # see "aside"
auditTheData <- function(d, balance_limit = 1e4, grace_period=14){
rbindlist(
lapply(AuditRules, function(item){
with( item,
d[ cond_fn(d),
.(msg_id,
msg_cat,
msg_txt = msg_txt(.SD) )
]
)
} )
)
}
auditTheData(df)
results in the error:
Error in eval(jsub, SDenv, parent.frame()) : object 'balance_limit' not found
It's not a problem with with()
, although I've read (?with
) that typically one should refrain from using it for programming. This also doesn't work:
auditTheData2 <- function(d, balance_limit = 1e4, grace_period=14){
rbindlist(
lapply(AuditRules, function(item){
d[ item[['cond_fn']](d),
.(msg_id,
msg_cat,
msg_txt = item[['msg_txt']](.SD) )
]
} )
)
}
auditTheData2(df) # Same error
Aside: if you don't do rm(balance_limit, grace_period)
before the "what doesn't work" function -- i.e. leave them in the global environment -- you get the desired results. So it seems like the function(item)
that is getting lapply
-ed can "see" into the global environment but not the parent environment (AuditTheData
).
1I'm using "non-standard" in the unscientific sense of "unusual" here. Idk what counts as non-standard, but that's another (and a too broad?) question.