0

I have a dataframe myframe with 8 variables in R. I then load the sqldf package using library(sqldf). Then I run the command:

proxyframe = sqldf("select a, b, c, d, e from myframe where c > 1000")

However, when I look at proxyframe using view(proxyframe) I see that there are clear instances of rows included where c < 1000. I did some further investigations and found that for some reason statements like:

"0" < 6  # returns true
"22" > 6 # returns false

I'm looking for any general tips or ideas at what might be causing my select statements to return odd dataframes, and if it has anything to do with R's weird use of comparison operators. My fear is that somehow column c is a character and the right comparison operator isn't being applied---though I don't have much experience how R works under the cover or how this package works.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
mt88
  • 2,855
  • 8
  • 24
  • 42

1 Answers1

2

Cast the character column to real:

> DF <- data.frame(a = "22", stringsAsFactors = FALSE)
> sqldf("select * from DF where cast(a as real) > 6")
   a
1 22
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you very much. This worked perfectly. I'm guessing this was a type/class issue. If so, quick followup: is there a quick way to mutate the specific columns of my dataframe to the type real so i don't have to use cast syntax? – mt88 Jun 23 '14 at 22:42
  • @mt88 Try `myframe$c <- as.numeric(myframe$c)`. If your `c` column is a factor (run `class(myframe$c)` to check), then you should use `as.numeric(as.character(myframe$c))`. – Gregor Thomas Jun 23 '14 at 23:19