0

I have a dataframe in R that has about 100k IDs. I want to place these IDs in my queries connecting R to Oracle, MySQL, and redshift tables using R packages RODBC, MYSQL, and redshift.

For example,

output <- dbGetQuery(conn,paste("select * from redshift_table 
where ID in (select ID from df"))

I've seen some solutions using the paste function for when you only have a few IDs but my list of IDs is very large and I don't think paste will work.

How can I join my R dataframe with these external databases in R? I prefer not to create a temp table in my external database.

Thanks in advance!

SabDeM
  • 7,050
  • 2
  • 25
  • 38
Rob
  • 1
  • I suspect that there aren't other options. The filtering either has to happen in db or in R, and that will involve having all the data in one of those two locations. The paste option won't work because the db will have a max number of items you can put in an IN clause. – joran Aug 15 '15 at 21:07
  • Upload into a temp table and do a semi join against that – hadley Aug 16 '15 at 18:00
  • Hi hadley. Thank you for your response. Are you saying upload the redshift table or the df into a temp table? And where? Into redshift or R? My redshift table is huge so loading all of that into R might not be feasible. – Rob Aug 17 '15 at 18:49

1 Answers1

1

At least for MySQL-DB package dplyr brings a solution:

library(dplyr)
mysql_table <- tbl(myDB, "mysql_table")

mysqltable %>%
    semi_join(df)

The other two DB are not supported by this package, altough you would be welcome to implement backends for other DB-Systems ;-)

MarkusN
  • 3,051
  • 1
  • 18
  • 26