0

I am trying to fetch data through RODBC package from Vertica DB. I currently have a SQL query like the one below.

library(rodbc) channel = odbcconnect("VerticaDB") query = paste
(
       SELECT *
       FROM   item_history
       WHERE  item_exp_date BETWEEN ",x," AND    ",y,"
       AND    item_code IN ('A1',
                            'A2',
                            'B1',
                            'B2')",sep="")result = (sqlQuery(channel,query)
) 

I have been able to parameterize the data passed in the 'BETWEEN' clause. Is there a way i can parameterize data being passed in the 'IN' clasue?

Also the number of data elements being passed in the 'IN' clause is very high (over 100 distinct items).

Is there a way if it can be passed from an external Vector or a file?

mxix
  • 3,539
  • 1
  • 16
  • 23
Srivathsan V
  • 33
  • 1
  • 8
  • 2
    To OP and future readers -please be aware that neither OP's attempt or accepted answer is a true parameterized SQL query. @Benjamin is a better attempt whereas others are simply concatenating a dynamic SQL string. – Parfait Aug 25 '16 at 12:46

2 Answers2

2

What you've is SQL injection and not a parameterized query. You may want to look at the RODBCext package and its vignette.

To properly parameterize the query, you may do

library(RODBC)
library(RODBCext)
channel = odbcConnect("VerticaDB")
query = paste0("select * from Item_History ",
               "where Item_Exp_Date between ? and ? ",
               "and Item_Code = ?")
item <- c("A1", "A2", "B1", "B2")
x <- 3
y <- 10 # I don't actually know what your x and y are, but hopefully you get the idea
sqlExecute(
  channel = channel,
  query = query,
  data = list(x = rep(x, length(item)),
              y = rep(y, length(item)),
              item = item),
  fetch = TRUE,
  stringsAsFactors = FALSE
)

This has big disadvantage, however, because sqlExecute will run a query for each row in the data argument (the list will be coerced to a data frame). If you have hundreds of elements in your item vector, you will run hundreds of queries on your SQL instance, which may not be particularly efficient.

A less obvious way of doing this write a stored procedure to construct the query.

Your stored procedure in SQL might look like

CREATE PROCEDURE schema.specialQuery 
  @x int;
  @y int;
  @in varchar(2000);
AS
BEGIN
  DECLARE @query = varchar(8000);
  SET @query =  'select * from Item_History ' + 
                'where Item_Exp_Date between ' + convert(@x, varchar(10)) + 
                        ' and ' + convert(@y, varchar(10)) + 
                        ' and Item_Code IN (' + @in ')'
  EXEC @query
END
GO

You may need to fiddle with the convert functions and some of the quotes, but it will work with

sqlExecute(
  channel = channel,
  query = "EXECUTE schema.specialQuery @x = ?, @y = ?, @in = ?",
  data = list(x = x,
              y = y,
              in = sprintf("'%s'", paste0(item, collapse = "', '"))),
  fetch = TRUE,
  stringsAsFactors = FALSE
)

Unfortunately, this approach is still susceptible to problems with poorly formatted character strings being passed via item, but it is likely faster than running hundreds of queries in the first approach I displayed.

Benjamin
  • 16,897
  • 6
  • 45
  • 65
1

To do this with string manipulation as in the question:

x <- "2000-01-01"
y <- "2001-01-01"
Item_Code <- c('A1','A2','B1','B2')

query <- sprintf("select * from Item_History
                  where Item_Exp_Date between '%s' and '%s'
                        and Item_Code in (%s)", x, y, toString(shQuote(Item_Code, 'sh')))

We could alternately use fn$ from the gsubfn package for string interpolation:

library(gsubfn)
query2 <- fn$identity("select * from Item_History
              where Item_Exp_Date between '$x' and '$y'
              and Item_Code in ( `toString(shQuote(Item_Code, 'sh'))` )")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341