0

I am working with a data set and I need to run several queries one after the other in order to create some new variables. The problem is that after the first query, the new variable which is created is of class data.frame. This means that subsequent queries do not run, because as explained in the following post: R- sqldf error raw vs double ... "The columns of your data.frame cannot be of class data.frame for sqldf to work"

The problem is that the above post does not tell me how to solve this problem. You can replicate my problem with the following example:

set.seed(999)
dt <- data.frame("x"=rnorm(150, 600,195))
sapply(dt, class)
library(sqldf)
dt$x2 <- sqldf("SELECT CASE
WHEN x>999 THEN x / 100
WHEN x>99  THEN x/10
ELSE x
END as x2
FROM dt")
sapply(dt, class)

The output of the last line shows that the class of X2 is data.frame. This means that when I try to run my next query, it will not work.

dt$x3 <- sqldf("SELECT CASE
WHEN x>999 THEN x - 100
WHEN x>99  THEN x - 10
ELSE x + 10
END as x3
FROM dt")

Any suggestion on how to resolve this?

TCS
  • 127
  • 1
  • 11
  • You are assigning a data.frame to a column in 'dt' – akrun Jul 14 '17 at 01:32
  • 1
    Possibly `dt$x2 <- unlist(sqldf("SELECT CASE ...)` will work. – lmo Jul 14 '17 at 01:32
  • I have provided a simplified version of my queries, which is based on a single table. In my actual research project, I am working with a relational database and I need to use sqldf to run the queries based on multiple tables. – TCS Jul 14 '17 at 01:32
  • If you are returning multiple variables at once, then you might need to use `cbind` (assuming order is preserved) or `merge` on some IDs. – lmo Jul 14 '17 at 01:34
  • What is wrong with getting your result set back as a data frame? What do you plan to do with the results of this query? – Tim Biegeleisen Jul 14 '17 at 01:40
  • Instead of `dt$x2 <- sqldf("SELECT CASE ...` try `dt <- sqldf("SELECT x, CASE ...` – G. Grothendieck Jul 14 '17 at 18:15

1 Answers1

1

How about this? Add [,1] to the end of the sqldf()

set.seed(999)
dt <- data.frame("x"=rnorm(150, 600,195))
sapply(dt, class)
library(sqldf)
dt$x2 <- sqldf("SELECT CASE
WHEN x>999 THEN x / 100
WHEN x>99  THEN x/10
ELSE x
END as x2
FROM dt")[,1]
sapply(dt, class)

sqldf returns a data frame. The sub setting pulls out the column.

Kevin
  • 319
  • 2
  • 10