0

How can I pass a column name as a parameter to SQL in R sqldf?

Neither

q <- "Q10"

A = fn$sqldf('SELECT * FROM Customer WHERE $q < 100')

or

q <- "Q10"

A = fn$sqldf('SELECT * FROM Customer WHERE '$q' < 100')

works.

joran
  • 169,992
  • 32
  • 429
  • 468
paulkp
  • 27
  • 1
  • 3
  • Define "works". The first one looks ok if Q10 is a numeric column in Customer. Suggest you make your example reproducible by providing `dput(head(Customer))`. – G. Grothendieck Nov 25 '13 at 03:44
  • It's a string column. The column's name is "Q10". I get an error: Error in source("~/Paul/R/DTA/Success rate123_2.R", echo = TRUE) : ~/Paul/R/DTA/Success rate123_2.R:3:63: unexpected string constant 2: q <- "Q10" 3: pred01 =fn$sqldf('SELECT * FROM CustomerQ_totals123T WHERE '$q' < $fract*Q9 AND Total > -1 ORDER BY Total DESC' ^ – paulkp Nov 25 '13 at 03:48
  • NEITHER of the above work. – paulkp Nov 25 '13 at 03:54
  • 4
    Just so you know, it's a little rude to SHOUT AT THE PEOPLE TRYING TO HELP YOU. Particularly, when that person is the author of the package you are using. – joran Nov 25 '13 at 04:37
  • Sorry, I just use capitals for emphasis in all my communications, email or otherwise! I thought it seemed that GG needed the NEITHER emphasised. I'm not a regular on forums in the last 10 years so I'm 10 years out of date in net etiquette. – paulkp Nov 25 '13 at 05:04
  • > head(CustomerQ_totals123T) Customer_code Industry_sector Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Total 1 1000019 Distributor 0.00 0.00 0 0.00 0.0 0.00 0.00 0.00 0.00 420.0 0.0 0.00 420.00 2 1000024 Private Hospital 0.00 0.00 0 0.00 0.0 0.00 0.00 0.00 0.00 0.0 0.0 0.00 0.00 – paulkp Nov 25 '13 at 05:06
  • The Q10 column is numeric. It's title is string obviously. – paulkp Nov 25 '13 at 05:09
  • OK, I found a trivial bug. Thanks and apologies all! – paulkp Nov 25 '13 at 05:11
  • 2
    OK, I found a trivial error. Sorry folks. Yes, the FIRST listing does work. – paulkp Nov 25 '13 at 07:01

3 Answers3

4

sprintf() will work charmingly in this case. Personally, I find code using sprintf() to be more easy to read (and even write) than paste().

q <- "Q10"
sql <- sprintf("SELECT * FROM Customer WHERE %s < 100", q)

sql
[1] "SELECT * FROM Customer WHERE Q10 < 100"

In fact, you can go much further with compiling complex SQL queries. Just to illustrate:

q <- "Q10"
value <- "150"
sql <- sprintf("SELECT * FROM Customer WHERE %s < %s", q, value)

sql
[1] "SELECT * FROM Customer WHERE Q10 < 150"
Rahul Premraj
  • 1,595
  • 14
  • 13
  • I do `sql <- sprintf("SELECT * FROM a, b WHERE a.name=%s", 'Leo'); DF <- sqldf( sql );`. I get `Error in rsqlite_send_query(conn@ptr, statement) : no such column: Leo`, although `Leo` column exists and the command `sqldf("SELECT * FROM a,b WHERE a.name='Leo'")` works independently. What do you think? Can you please clarify your answer? - - You need `sql <- "... '%s'"`. Also here https://stackoverflow.com/a/36881772/54964 – Léo Léopold Hertz 준영 May 24 '17 at 14:40
2

Solution:

You can use paste command to concatenate variable and string.

> library(sqldf)
> my_names <- names(sqldf("select * from iris limit 10"))
> sqldf(paste("select",my_names[1], "from iris limit 2", sep=" "))
      Sepal_Length
1          5.1
2          4.9

or using fn$ to perform the string interpolation, the last line could be written:

> fn$sqldf("select `my_names[1]` from iris limit 2")

Potential Problem:

The names might be different when you try to use sqldf to query. For example, the default names for dataset iris is:

> names(iris)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species" 

However, the names have to be formatted in a different way when you try to use the column names to make queries:

> sqldf("select * from iris limit 1")
  Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1          5.1         3.5          1.4         0.2  setosa    

This is the official explaination:

Here is an example of sorting and limiting output from an SQL select statement on the iris data frame that comes with R. Note that although the iris dataset uses the name Sepal.Length the RSQLite layer converts that to Sepal_Length.

So that is the reason in my solution I first created the my_names variable from sqldf select statement instead of names(iris).

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178
0

For me the sprintf is the perfect solution, you need to change the %s by %d for integers i think.

q= 'Q10' ; value= 150
sql <- sprintf("SELECT * FROM Customer WHERE %s < %d", q, value)

sql
[1] "SELECT * FROM Customer WHERE Q10 < 150"
Hazem HASAN
  • 1,598
  • 2
  • 21
  • 38