0

I am using sqldf to train some r-users in SQL. Loading RPostgreSQL or RH2 before loading sqldf will change the default SQL used by sqldf. This works fine with h2, but every time I load RPostgreSQL, R will crash on first attempted query. I want to make sqldf work with RPostgreSQL to enable use of date functions which are lacking in SQLite and h2.

# packages
library(tidyverse)
# library(RH2) # comment out this row or the next to determine default SQL for sqldf
library(RPostgreSQL) 
library(sqldf)

Output confirms that "sqldf will default to using PostgreSQL"

Create some data:

set.seed(42)
N <- 1e6

sales <- tibble(
    buyer_id = 1:N/100,
    sales_date = sample(seq(as.Date('2018/01/01'), as.Date('2021/01/01'), by="day"), N, replace = TRUE),
    sales_amount = rpois(N, 200)
) 

Crash R:

sqldf("
select 
    max(sales_date) 
from sales
")

"R Session Aborted R encountered a fatal error The session was terminated"

Joe
  • 3,217
  • 3
  • 21
  • 37
  • The order of loading sqldf and RPostgreSQL does not matter. Also did you read `?sqldf` and properly configure it for postgresql? Di you install postgreSQL? Can you use it without R? h2 has 18 time and date functions so you may not need postgresql. http://www.h2database.com/html/functions.html – G. Grothendieck Nov 19 '21 at 18:07
  • @G.Grothendieck The key date function lacking in h2 is date_trunc(). I don't see an equivalent function that allows for rapid grouping of dates by week, month, qtr, yr in h2. An h2 user would have to parse out year and week or year month in two steps and then recombine. – Joe Nov 19 '21 at 18:08
  • I have read ?sqldf, and configured accordingly. The bug seems to be with RPostgreSQL. – Joe Nov 19 '21 at 18:09
  • (1) If you just need that function for grouping then in h2 use formatdatetime. or use the year, month, etc. functions and combine them to get what you need. (2) I believe that h2 actually does have trunc_date but it is only turned on when in postgresql mode since it is not standard SQL. You might need to modify RH2 to do that. – G. Grothendieck Nov 19 '21 at 18:27

1 Answers1

1

If the only reason not to use H2 is date_trunc then here are some ways around that.

1) Macros Here is a workaround for truncation to the beginning of year/quarter/month/week. These functions act as macros which expand into code accepted by H2. Be sure to prefix sqldf with fn$ and surround each with back quotes to turn on substitution. Note that in each case the argument is a string. Add the verbose=TRUE argument to sqldf to see the generated code.

library(RH2)
library(sqldf)

trunc_year <- function(x) sprintf("DATEADD(day, 1-day_of_year(%s), %s)", x, x)
trunc_qtr <- function(x) sprintf("DATEADD(month, 3*(quarter(%s)-1), %s)", x, 
  trunc_year(x))
trunc_month <- function(x) sprintf("DATEADD(day, 1-day_of_month(%s), %s)", x, x)
trunc_week <- function(x) sprintf("DATEADD(day, -iso_day_of_week(%s), %s)", x, x)

# test
DF <- data.frame(x = as.Date("2021-11-15"))
fn$sqldf("select x, 
                `trunc_year('x')` year, 
                `trunc_qtr('x')` qtr,
                `trunc_month('x')` month,
                `trunc_week('x')` week
          from DF")
##            x       year        qtr      month       week
## 1 2021-11-15 2021-01-01 2021-10-01 2021-11-01 2021-11-14

2) Patch RH2 Another possibility is to patch your installation of RH2 with the newer version of H2 which has date_trunc. To do this below we remove the the h2-1.3.175.jar file in RH2, which contains H2, and replace it with the newer version h2-1.4.200.jar. This should work as long as you have write permission in the java subdirectory of your RH2 installation. Just run this code in R and your RH2 installation will be patched.

u <- "https://h2database.com/h2-2019-10-14.zip"
z <- basename(u)
tmp <- tempdir()  # create temporary dir 
old.dir <- setwd(tmp)  # go to it
download.file(u, z)  # download u
unzip(z, "h2/bin/h2-1.4.200.jar")  # extract jar from zip file

# copy new jar file to RH2 installation and remove old one
new.jar <- file.path(tmp, "h2", "bin", "h2-1.4.200.jar")
old.jar <- dir(system.file("java", package = "RH2"), "h2.*jar$", full.names = TRUE)
if (basename(old.jar) != basename(new.jar)) {
  file.copy(new.jar, dirname(old.jar))
  file.remove(old.jar)
}
setwd(old.dir)  # return to original directory

# test
library(RH2)
library(sqldf)
sqldf("select h2version()")
##   '1.4.200'
## 1   1.4.200
# see more tests in (3) below

3) Rebuild RH2 Another possibility is create a new source version of RH2 with the newer version of H2 that has date_trunc. The first argument is 'year', 'quarter', 'month' or 'week' and the second argument is a date. Thus, if you are willing to rebuild RH2 with this new H2 version then it is available.

(a) Download the source of RH2 from https://cran.r-project.org/package=RH2 and detar it to create a directory tree RH2.

(b) Download this h2 zip https://h2database.com/h2-2019-10-14.zip and extract h2/bin/h2-1.4.200.jar from that zip file and place the jar file in the RH2/inst/java directory of the RH2 source removing the old one, h2-1.3.175.jar

(c) rebuild and test RH2 like this:

# rebuild RH2
# cd to the RH2 directory containing its DESCRIPTION file
setwd("...whatever.../RH2")

# build RH2 
#  on Windows this needs Rtools40 (not an R package)
#  https://cran.r-project.org/bin/windows/Rtools/
library(devtools)
build()
install(args = "--no-multiarch")

# test
library(RH2)
library(sqldf)
sqldf("select h2version()") # check it's the latest version
##   '1.4.200'
## 1   1.4.200

DF <- data.frame(x = as.Date("2000-11-15"))
sqldf("select date_trunc('month', x) from DF")
##   DATE_TRUNC('month', x)
## 1             2000-11-01
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341