0

I'm building some utility functions to simplify writing cast(statement as type) in an SQL query easier from R.

The way I'm doing so is through one workhorse function, as_type which is called by several one-use functions (e.g. as_bigint); crucially, I also think calling as_type directly is a valid use case.

The basic structure of the code is like:

as_type = function(x, type) {
  if (is.na(match(type, known_types())))
    stop("Attempt to cast to unknown type ", type)
  sprintf('cast(%s as %s)', deparse(substitute(x, parent.frame())), type)
}

as_bigint = function(x) as_type(x, 'bigint')

known_types = function() 'bigint'

# more complicated than this but for works the purposes of this Q
query_encode = glue::glue

With expected usages like

query_encode("select {as_bigint('1')}")
query_encode("select {as_type('1', 'bigint')}")

(in reality there are several more valid types and as_ functions for other valid SQL types; only query_encode is exported)

Unfortunately, calling as_type directly fails because, as noted in ?substitute (h/t Brodie G on Twitter):

If [a component of the parse tree] is not a bound symbol in [the second argument to substitute] env, it is unchanged

query_encode("select {as_bigint('1')}")
# select cast("1" as bigint)
query_encode("select {as_type('1', 'bigint')}")
# select cast(x as bigint)

I've cooked up the following workaround but it hardly feels robust:

as_type = function(x, type) {
  if (is.na(match(type, known_types())))
    stop("Attempt to cast to unknown Presto type ", type)
  prev_call = as.character(tail(sys.calls(), 2L)[[1L]][[1L]])
  valid_parent_re = sprintf('^as_(?:%s)$', paste(known_type(), collapse = '|'))
  eval_env = 
    if (grepl(valid_parent_re, prev_call)) parent.frame() else environment()
  sprintf(
    'cast(%s as %s)',
    gsub('"', "'", deparse(substitute(x, eval_env)), fixed = TRUE),
    type
  )
}

I.e., examine sys.calls() and check if as_type is being called from one of the as_ functions; set env argument to substitute as parent.frame() if so, current environment if not.

This works for now:

query_encode("select {as_bigint('1')}")
# select cast("1" as bigint)
query_encode("select {as_type('1', 'bigint')}")
# select cast("1" as bigint)

The question is, is this the best way of going about this? Phrased as such, it feels like an opinion-based question, but what I mean is -- (1) is this approach as fragile as it feels like at first glance and (2) assuming so, what's an alternative that is more robust?

E.g. it's notable that is.name(x) is FALSE from as_type, but it's not clear to me how to use this to proceed.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • 1
    Without thinking much about it, why not have a `.as_type` that expects `x` already substituted, an then all the `as_*` functions, including `as_type`, would substitute, and call `.as_type`? Also, note you probably want `paste0(deparse(...), collapse='\n')`. – BrodieG Jul 27 '19 at 12:26
  • This does not seem simpler than just writing out `cast...` in the SQL statement but ignoring that if the problem is to evaluate the argument of `substitute` we can write: `do.call("substitute", list(...whatever...))` – G. Grothendieck Jul 27 '19 at 14:48
  • @g.grothendieck ymmv – MichaelChirico Jul 27 '19 at 15:27

2 Answers2

1

Here is the possible alternate approach I allude to in the comments:

.as_type <- function(x_sub, type) {
  if(!isTRUE(type %in% known_types()))
    stop("Attempt to cast to unknown type ", type)
  sprintf('cast(%s as %s)', deparse(paste0(x_sub, collapse='\n')), type)
}
as_bigint <- function(x) .as_type(substitute(x), 'bigint')
as_type <- function(x, type) .as_type(substitute(x), type)
known_types <- function() 'bigint'
query_encode <- glue::glue

Then

query_encode("select {as_bigint('1')}")
## select cast("1" as bigint)
query_encode("select {as_type('1', 'bigint')}")
## select cast("1" as bigint)

In terms of what you actually want to do, I think we're stuck with variations on what you're doing, and I agree it feels a bit dirty. This is dirty in a different way, but not that dirty and seems like it might work. The only dirtyness really is the need to have each function call substitute, but that's not that big a deal.

In terms of fragility, to the extent you don't export the as_ functions, then it seems okay, although it does feel odd not to export those functions. I would export them, but if you do that, then you need far more robust checking as then people can rename the functions, etc. One thing to watch out for is that it is possible for the compiler to mess with frame counts. It really shouldn't, but Luke Tierney seems more comfortable doing that than I would be.

BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • I actually had this version as of last week the reason I changed it this week was a feeling it's a bit repetitive having `substitute` 10-12 (=`length(known_types())`) times for each function, was looking for a way around that. also had an issue where substitute swaps `'` for `"`, which is unavoidable & meaningful for when it gets passed to SQL... so I thought it cleaner to have that& any other potential cleanup all done in `as_type` if possible – MichaelChirico Jul 27 '19 at 12:58
  • 1
    the difference is I was doing `deparse(substitute())` both at the `as_bigint` level, maybe separating only the `substitute` out (as you've done) allows a compromise of the two approaches... will need to let it simmer a bit – MichaelChirico Jul 27 '19 at 13:00
1

I believe you might have overlooked glue transformers. Going from character to call to end up with character again is a big detour that you don't need to take.

Transformers allow you to apply functions to the glue input and output, before and after evaluation, you can read more about them here. Keeping your format we can build :

library(glue)

cast_transformer <- function(regex = "as_(.*?)\\((.*)\\)$", ...) {
  function(text, envir) {
    type <- sub(regex, "\\1", text)
    known_types <- "bigint"
    if(type %in% known_types)
    {
      val <- sub(regex, "\\2", text)
      glue("cast({val} as {type})")
    } else {
      eval(parse(text = text, keep.source = FALSE), envir)
    }
  }
}

glue("select {as_bigint('1')}",.transformer = cast_transformer())
#> select cast('1' as bigint)

Because we are now parsing the expression, there is no function as_bigint, you can still keep the syntax if it's convenient to you but nothing stops you from simplifying it to something like :

glue("select {bigint: '1'}",.transformer = cast_transformer("(\\D+): (.*)$"))
#> select cast('1' as bigint)

Choose the default regex that you like and define the wrapper query_encode <- function(query) glue(query, .transformer = cast_transformer()) and you're good to go.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167