2

is there any way to put sqldf queries inside user defined functions? I have gone through this : http://r.789695.n4.nabble.com/Passing-Multiple-Variable-Into-SQLDF-Statement-as-parameters-of-function-td4636147.html , R call variable inside sqldf.

My sample code would be like this :

db1 = data.frame(a = c(1,2,3), b = c("a","b","c"))
db2 = data.frame(a = c(1,2,3), b = c("b","a","c"))
db = list(db1,db2)

extrct = function(x){
Example=paste0("select * from", x , "where b
='","b", "'")
sqldf(Example,verbose=TRUE) 
}

I have many databases and it would be quite easy to write SAS macro like codes to extract data as long as sqldf works within a function. Else I have written R codes for some small processes but there are many with complex SQL procedures which would be much easier in sqldf. Thanks in advance.

Sayak
  • 183
  • 1
  • 11
  • 1
    Your SQL expression is not well formed. Use this line, and you will be good to go: `Example=paste0("select * from ", 'x' , " where b='","b", "'", collapse="")`. Remember to test the internal parts of your functions to make sure that they produce the expected output. – lmo Nov 11 '17 at 19:22

2 Answers2

3

Try this:

library(sqldf)

extract <- function(x, envir = parent.frame(), verbose = TRUE, ...) {
  fn$sqldf("select * from [$x] where b = 'b'", envir = envir, verbose = verbose, ...)
}

# sample runs
extract("db1")
extract("db2")

Map(extract, c("db1", "db2"))

db <- setNames(db, c("db1", "db2"))
lapply(names(db), extract, envir = list2env(db))

If we change the last line to this then the output will have component names but is otherwise the same:

sapply(names(db), extract, envir = list2env(db), simplify = FALSE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

This is how I'd write it. sqldf more naturally handles strings, rather than using nse. So just pass in the string/name of the data.frame you want to use a source.

library(sqldf); requireNamespace("checkmate")
db1 <- data.frame(a = c(1,2,3), d = c("a","b","c"), stringsAsFactors = F)

extract <- function( table_name, criteria_d ) {
  checkmate::assert_character(table_name, min.chars=1, len=1, any.missing=F)
  checkmate::assert_character(criteria_d, min.chars=1, len=1, any.missing=F)

  # Half-way attempt to prevent sql-injection. Values would need to be only numbers, letters, and underscores.
  checkmate::assert_character(table_name, pattern="^\\w+$", len=1, any.missing=F)
  checkmate::assert_character(criteria_d, pattern="^\\w+$", len=1, any.missing=F)

  sql <- paste0("select * from [", table_name , "] where d ='", criteria_d, "'")

  cat("Executing: `", sql, "`\n", sep="")
  sqldf(sql, verbose=F) 
}

extract("db1", "b")

If for some reason you couldn't know the string/name of the variable, this is equivalent: extract(quote(db1), "b").

A few notes.

  1. I changed the name of the variable to 'd' to make things more clear.
  2. I assumed that db2 and db weren't relevant to your scenario.
  3. I tried not to change your code too much. If this function is ever connected to a real database, protect against sql injection.
  4. If your sql gets a little more complicated, consider using glue::glue_sql().

Edit in response to @Sayak's comment:

Use purrr::map_df() to loop through a vector of data.frame names

c("db1", "db2") %>% 
  purrr::map_df(extract, "b") 

and combines the results into a single data.frame:

Executing: `select * from [db1] where d ='b'`
Executing: `select * from [db2] where d ='b'`
  a d
1 2 b
2 1 b

This is pretty slick that it doesn't require a subsequent call to dplyr::bind_rows().

If you need to vary the criteria (ie, so it's not always "b"), use purrr::pmap_df() with packaging the inputs as a data.frame (whose columns match the parameters of your extract() function:

ds_input <- tibble::tribble(
  ~table_name, ~criteria_d,
  "db1",         "b",
  "db1",         "c",
  "db2",         "c"
)

ds_input %>% 
  purrr::pmap_df(extract)

# Executing: `select * from [db1] where d ='b'`
# Executing: `select * from [db1] where d ='c'`
# Executing: `select * from [db2] where d ='c'`
#   a d
# 1 2 b
# 2 3 c
# 3 3 c
wibeasley
  • 5,000
  • 3
  • 34
  • 62
  • Thanks a lot for this answer. Just one question please, if I have multiple dataframes and same column in every one of them , can if define extract "b" similarly and lapply through the list of data frames? Or is there any way to do the same with the function extract? – Sayak Nov 11 '17 at 20:04