0

first time question, so if I missed something I apologize:

I imported an excel file into R! using XLconnect, the str() function is as follow:

data.frame':    931 obs. of  5 variables:
 $ Media        : chr  "EEM" "EEM" "EEM" "EEM" ...
 $ Month        : POSIXct, format: "2014-08-01" "2014-08-01" "2014-08-01" "2014-08-01" ...
 $ Request_Row  : num  8 25 26 37 38 44 53 62 69 83 ...
 $ Total_Click  : num  12 9 9 8 8 8 7 7 7 7 ...
 $ Match_Type   : chr  "S" "S" "S" "S" ...

when I use the following sqldf I get no rows selected, anyway to what could be wrong:

sqldf(" select Media, sum(Total_Click) , avg(Request_Row), min(Request_Row) , max(Request_Row), count(distinct(Media)) from All_Data 
      where Request_Row < 100  
      and month='2014-09-01'
      group by 1,2 order by 2,6 desc ")

<0 rows> (or 0-length row.names)

Thanks for the help

Vj

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • does the month '2014-09-01' actually exist in the excel file? – Alex Dec 02 '14 at 00:29
  • 1
    And do note that `month != Month`. (R is case sensitive and you might think about fixing your question title since it's "just ugly" at the moment.) – IRTFM Dec 02 '14 at 00:41
  • Hearing no response I will assume that the spelling is the issue and vote to close as a simple typo. – IRTFM Dec 02 '14 at 00:58
  • Thanks Guys! this solved the problem, strftime('%Y-%m-%d', month, 'unixepoch', 'localtime') ='2014-08-01' – user4313679 Dec 02 '14 at 03:41
  • The problem here was not just month/Month. There were multiple problems with the code as outlined in my answer so I am voting to re-open. – G. Grothendieck Dec 02 '14 at 19:49

1 Answers1

0

Its not clear what is intended but the code shown has these problems:

  1. Month is used in the data but month is used in the SQL statement

  2. SQLite has no date or time types and so if you send a POSIXct value to SQLite it will be interpreted as the number of seconds since the UNIX epoch (in GMT time zone). Thus the comparison of the month to a character string won't work. You can convert the number of seconds to yy-mm-dd using the SQLite strftime or date functions. Alternately use a database that has datetime types. sqldf supports the H2 database and it supports date and time types.

  3. The statement is trying to group by Media and sum(Total_Click). Grouping by an aggregated value is not legal although perhaps it could be done by nesting selects depending on what you intended.

  4. Since the statement is grouping by Media the expressoin count(distinct(Media)) fromAll_Data will always be 1 since there can only be one Media in such a group.

You will need to clarify what your intent is but if we drop or fix up the various points we can get this:

sqldf("select 
          Media,
          sum(Total_Click) sum_Total_Click, 
          avg(Request_Row) avg_Request_Row, 
          min(Request_Row) min_Request_Row,
          max(Request_Row) max_Request_Row
    from All_Data
    where Request_Row < 100
    and date(month, 'unixepoch', 'localtime') = '2014-08-01'
    group by 1 order by 2 desc")

which gives:

  Media sum_Total_Click avg_Request_Row min_Request_Row max_Request_Row
1   EEM              38              24               8              37

RH2 To use the RH2 package and H2 database instead be sure you have Java and RH2 installed (RH2 includes the H2 database so that does not need to be separately installed) and then:

library(RH2)
library(sqldf)
sqldf("...")

where the ... is replaced with the same SQL statement except the date comparison simplifies to this line:

and month = '2014-08-01'

Data: When posting to the SO R tag please show your data using dput. In this case this was used:

All_Data <-
structure(list(Media = c("EEM", "EEM", "EEM", "EEM"), Month = structure(c(1406865600, 
1406865600, 1406865600, 1406865600), class = c("POSIXct", "POSIXt"
), tzone = ""), Request_Row = c(8, 25, 26, 37), Total_Click = c(12, 
9, 9, 8), Match_Type = c("S", "S", "S", "S")), .Names = c("Media", 
"Month", "Request_Row", "Total_Click", "Match_Type"), row.names = c(NA, 
-4L), class = "data.frame")

Update: Misc revisions.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • I do apologize for delay in response ; The month to Month typo isn't the problem I have tested that, it was a typo here not in the original R code Thanks VJ – user4313679 Dec 02 '14 at 01:42