0

I have a datafile sales_history. I want to query it in the following way.

my_df<-sqldf("SELECT * 
                    FROM sales_history
                    WHERE Business_Unit=='RETAIL'"")

Now I want to write a function with argument datafile and column name to do the above job. So something like:

pick_column<-function(df, column_name){
  my_df<-sqldf("SELECT * 
                      FROM df
                      WHERE Business_Unit==column_name"
  return(my_df)
}

Ideally, after running the above function definition, I should then be able to run pick_column(sales_history,'RETAIL'). But when I do this, the second argument 'RETAIL' is not passed to the function correctly. What's the correct way to do this then?

I know that for this example, there are other ways to do this other than using "sqldf" for SQL query. But the point of my question here is how to pass the column_name correctly as a function argument.

zx8754
  • 52,746
  • 12
  • 114
  • 209
  • On sqldf home page see: https://code.google.com/p/sqldf/#Example_5._Insert_Variables – G. Grothendieck Jun 02 '14 at 03:12
  • Thanks @G.Grothendieck! But I am very new to R and it's hard for me to follow that page. Given the example in the question I asked, is it possible for you to write a paragraph of code illustrating the correct way to write and call such a function? Thanks very much. – user3698033 Jun 02 '14 at 03:36

1 Answers1

4

the sqldf package uses gsubfn to allow you to add names of R variables into your SQL commands by prefixing them with the "$" character. So you can write

sales_history <- data.frame(
    price=c(12,10),
    Business_Unit=c("RETAIL","BUSINESS"),
    stringsAsFactors=F
)

pick_column <- function(df, columnname) {
    fn$sqldf("SELECT * FROM $df WHERE Business_Unit='$columnname'")
}

pick_column("sales_history","RETAIL")
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks! So what is the correct way to call it? I installed and loaded gsubfn and try call it in both the following ways: pick_column(sales_history,'RETAIL') and pick_column(sales_history,RETAIL); none of them work. The error message is Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: my_df) In addition: Warning message: In value[[3L]](cond) : bind.data must have non-zero dimensions. Do you have any idea why it failed? – user3698033 Jun 02 '14 at 03:54
  • It doesn't work for me... when I add print('$column_name') to the function and call it, it prints out "$column_name" (which is supposed to output 'RETAIL'). I checked that I loaded sqldf and gsubfn. Do you have any idea again why it failed? Also, I think maybe df should not be added $df? it will produce syntax error. Just using df without $ prefix works fine for datafile argument. – user3698033 Jun 02 '14 at 04:38
  • @user3698033 I've updated with code i've been able to test. I negelected to change `sqldf` to `fn$sqldf` and it looks like it doesn't like variable names with underscore. – MrFlick Jun 02 '14 at 04:50
  • 1
    The variable name can alternately be surrounded with backticks in which case more general variable names or even entire R expressions can be used. – G. Grothendieck Jun 02 '14 at 10:14