I work around this problem on SQL server using in_schema
and dbExecute
as follows. Assuming Netezza is not too different.
Part 1: shared connection
The first problem is to connect to both tables via the same connection. If we use a different connection then joining the two tables results in data being copied from one connection to the other which is very slow.
con <- dbConnect(...) # as required by your database
table_1 <- dplyr::tbl(con, from = dbplyr::in_schema("db1", "table_name_1"))
table_2 <- dplyr::tbl(con, from = dbplyr::in_schema("db2.schema2", "table_name_2"))
While in_schema
is intended for passing schema names you can also use it for passing the database name (or both with a dot in between).
The following should now work without issue:
# check connection
head(table_1)
head(table_2)
# test join code
left_join(table_1, table_2, by = "id") %>% show_query()
# check left join
left_join(table_1, table_2, by = "id") %>% head()
Part 2: write to datebase
A remote table is defined by two things
- The connection
- The code of the current query (e.g. the result of
show_query
)
We can use these with dbExecute
to write to the database. My example will be with SQL server (which uses INTO
as the keyword, you'll have to adapt to your own environment if the sql syntax is different).
# optional, extract connection from table-to-save
con <- table_to_save$src$con
# SQL query
sql_query <- paste0("SELECT *\n",
"INTO db1.new_table \n", # the database and name you want to save
"FROM (\n",
dbplyr::sql_render(table_to_save),
"\n) subquery_alias")
# run query
dbExecute(con, as.character(sql_query))
The idea is to create a query that can be executed by the database that will write the new table. I have done this by treating the existing query as a subquery of the SELECT ... INTO ... FROM (...) subquery_alias
pattern.
Notes:
- If the sql query produced by
show_query
or sql_render
would work when you access the database directly then the above should work (all that changes is the command is arriving via R instead of via the sql console).
- The functions I have written to smooth this process for me can be found on here. They also include appending, deleting, compressing, indexing, and handling views.
- Writing a table via
dbExecute
will error if the table already exists in the database, so I recommend checking for this first.
- I use this work around in other places, but inserting the database name with
in_schema
has not worked for creating views. To create (or delete) a view I have to ensure the connection is to the database where I want the view.