0

Unless I'm missing something there is no reference in documentation to how to use functions such as db_write_table in practice. The only reference in package documentation is.

db_write_table(con, table, types, values, temporary = FALSE, ...)

types is not explained. None of dplyr's remote connectivity functionality seems to be documented in much detail anywhere, e.g. dplyr.tidyverse.org.

I have a database mydb with table mydata I can connect

require(RPostgreSQL)
require(dplyr)

drv <- dbDriver("PostgreSQL")
sapply(dbListConnections(drv), dbDisconnect)
con <- dbConnect(drv, dbname = "mydb", host = "localhost", port = 5432, user = "username")
data_db = tbl(con, 'mydata')

The remote data tbl data_db contains fields with one called uid. I have a data_frame dat with observations I need to append to mydata if their dat$uid values are not already present.

Is this something that can be done with db_write_table or does it require a SQL approach?

geotheory
  • 22,624
  • 29
  • 119
  • 196

2 Answers2

1

Per @hadley's comment to this answer, you can use dbWriteTable(append = TRUE). Here's an example with SQLite:

library(dplyr)

# Create some example data
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")
dd <- data.frame(uid = 1:20, value = LETTERS[1:20])
dat <- data.frame(uid = 15:26, value = LETTERS[15:26])
copy_to(con, dd, "my_data",
  temporary = FALSE, 
  indexes = list("uid", "value")
)

# Identify rows to append
dd_db <- tbl(con, "my_data")
dd_uid <- select(dd_db, uid) %>%
  collect() %>%
  unlist()
dat_to_append <- filter(dat, !(uid %in% dd_uid))

DBI::dbWriteTable(con, "my_data", dat_to_append, append = TRUE)
Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
  • 1
    Thanks Weihuang. I see a likely performance issue with this for big databases: the whole database `uid` field is downloaded before every append, even for just 1 record. I'm guessing better to use `dat_matched <- select(dd_db, uid) %>% filter(uid %in% dat$uid) %>% collect` and append the result of `anti_join(dat, dat_matched, by="uid")` – geotheory Sep 03 '18 at 08:18
  • Ah, agree. Your approach is more sensible. – Weihuang Wong Sep 03 '18 at 14:03
0

So for what it's worth this is my implementation, which should perform better for my case (small appends to a big database):

db_add_unmatched = function(con, remote, dat, uid, append = TRUE, row.names = FALSE){
  tbl_remote = tbl(con, remote)
  dat_matched = select(tbl_remote, uid) %>% filter(uid %in% dat[[uid]]) %>% collect
  if(nrow(dat_matched) == 0){
    dat_unmatched = dat
  } else dat_unmatched = anti_join(dat, dat_matched, by = uid)
  if(nrow(dat_unmatched) > 0){
    DBI::dbWriteTable(con, remote, dat_unmatched, append = append, row.names = row.names)
  } else message('No unmatched rows to add')
}

db_add_unmatched(con, 'mydata', dat, 'uid')
geotheory
  • 22,624
  • 29
  • 119
  • 196