Using sqldf with r variable containing underscore in its name

Question

This code

> A <- data.frame(col1 = c(1,2,3),col2 = c("red","blue","green"))
> color_num <- 2
> fn$sqldf("select * from A where col1 >= '$color_num'")

yields the error

Error in eval(parse(text = paste(..., sep = "")), env) : object 'color' not found

But if the variable color_num is instead given a name with no underscore (say colornum), then executing fn$sqldf("select * from A where col1 >= '$colornum'") yields the expected results with no error.

I believe sqldf is replacing underscores with periods behind the scenes, causing it to treat the component preceding the underscore as a table and the part following as a column name. This answer (and comments) to a question about column names in sqldf indicates that the library at one time replaced dots with underscores but no longer does, but I couldn't find anything about underscores being replaced with dots.

This is an issue since the naming convention I'm using makes heavy use of underscores for variable names.

Is there any way to get variable names with underscores in them working in sqldf queries?

score 2 · Accepted Answer · answered Nov 07 '18 at 14:32

2

You can use backticks:

fn$sqldf("select * from A where col1 >= `color_num`")
##   col1  col2
## 1    2  blue
## 2    3 green

answered Nov 07 '18 at 14:32

G. Grothendieck

254,981
17
203
341

Mike · Answer 2 · 2018-11-06T20:11:19.620

1

You can use paste0 around your sql code so that r evaluates the color_num to 2 and then pastes it together into one sql statement.

library(sqldf)
    A <- data.frame(col1 = c(1,2,3),col2 = c("red","blue","green"))
    color_num <- 2

    fn$sqldf(paste0("select * from A where col1 >=",color_num))

If you want to use the `$var` approach and are having trouble with `_` here is a work around to get all variables to have . instead of _ , this is probably inefficient but works.

color_col <- "blue"
#get list of values with underscores and not your df
nms <- setdiff(ls(),c("A"))
#change name of list to have '.' instead of '_'
setNames(nms,gsub("_",".",nms))
#Assign values to names with '.'s
myvars <- lapply(setNames(nms,gsub("_",".",nms)), function(x){
  assign(gsub("_",".",x) , get(x))})
#bring them to global env
list2env(myvars,.GlobalEnv)

#run example query
fn$sqldf("select * from A where col1 >= '$color.num' and col2 = '$color.col' ")

edited Nov 06 '18 at 20:11

answered Nov 06 '18 at 14:23

Mike

3,797
1
11
30

Thanks @Mike. I've tried wrapping `[ ]` around variable names to no avail. Regarding constructing the string with `paste0`, this isn't ideal since variable names can appear in many places within a complicated sql string, so I'd have to `paste0` many different substrings together. The ideal solution would allow the standard variable name usage in `sqldf` to accommodate underscores, which are commonly found in variable names. Is there a way to do this? – Rookatu Nov 06 '18 at 15:53
I see the problem. This might be a bug in the `sql` package, Here is a solution where you can change the `_` to `.` so your code will work. If you like this I will expand it using `lapply` to change all of your variables. -`assign(gsub("_",".",setdiff(ls(),c("A") )) ,get(setdiff(ls(),c("A") )))` – Mike Nov 06 '18 at 16:43

Using sqldf with r variable containing underscore in its name

2 Answers2

If you want to use the $var approach and are having trouble with _ here is a work around to get all variables to have . instead of _ , this is probably inefficient but works.

If you want to use the `$var` approach and are having trouble with `_` here is a work around to get all variables to have . instead of _ , this is probably inefficient but works.