I'm building some utility functions to simplify writing cast(statement as type)
in an SQL query easier from R.
The way I'm doing so is through one workhorse function, as_type
which is called by several one-use functions (e.g. as_bigint
); crucially, I also think calling as_type
directly is a valid use case.
The basic structure of the code is like:
as_type = function(x, type) {
if (is.na(match(type, known_types())))
stop("Attempt to cast to unknown type ", type)
sprintf('cast(%s as %s)', deparse(substitute(x, parent.frame())), type)
}
as_bigint = function(x) as_type(x, 'bigint')
known_types = function() 'bigint'
# more complicated than this but for works the purposes of this Q
query_encode = glue::glue
With expected usages like
query_encode("select {as_bigint('1')}")
query_encode("select {as_type('1', 'bigint')}")
(in reality there are several more valid types and as_
functions for other valid SQL types; only query_encode
is exported)
Unfortunately, calling as_type
directly fails because, as noted in ?substitute
(h/t Brodie G on Twitter):
If [a component of the parse tree] is not a bound symbol in [the second argument to
substitute
]env
, it is unchanged
query_encode("select {as_bigint('1')}")
# select cast("1" as bigint)
query_encode("select {as_type('1', 'bigint')}")
# select cast(x as bigint)
I've cooked up the following workaround but it hardly feels robust:
as_type = function(x, type) {
if (is.na(match(type, known_types())))
stop("Attempt to cast to unknown Presto type ", type)
prev_call = as.character(tail(sys.calls(), 2L)[[1L]][[1L]])
valid_parent_re = sprintf('^as_(?:%s)$', paste(known_type(), collapse = '|'))
eval_env =
if (grepl(valid_parent_re, prev_call)) parent.frame() else environment()
sprintf(
'cast(%s as %s)',
gsub('"', "'", deparse(substitute(x, eval_env)), fixed = TRUE),
type
)
}
I.e., examine sys.calls()
and check if as_type
is being called from one of the as_
functions; set env
argument to substitute
as parent.frame()
if so, current environment if not.
This works for now:
query_encode("select {as_bigint('1')}")
# select cast("1" as bigint)
query_encode("select {as_type('1', 'bigint')}")
# select cast("1" as bigint)
The question is, is this the best way of going about this? Phrased as such, it feels like an opinion-based question, but what I mean is -- (1) is this approach as fragile as it feels like at first glance and (2) assuming so, what's an alternative that is more robust?
E.g. it's notable that is.name(x)
is FALSE
from as_type
, but it's not clear to me how to use this to proceed.