0

I am working with really big data with R. My data is on Hive and I am using rjdbc. I am thinking of using a reference table on R because its impossible to load the table onto R even just using 10% sample. I am using the tbl function from dplyr.

transaction <- tbl(conn,"transaction")

R gave me an error message :

the dbplyr package is required to communicate with the database backends.

I am using a remote computer and it's impossible to install package on this R version. Any other solutions to solve the problem?

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
Ninjia123
  • 47
  • 6
  • If you are reading only a subset of dataset, then create a subset table in hive `dbSendUpdate(hivecon, "drop table if exists yourdb.tmptable")` then create a subset table with `dbSendUpdate` from the main table and read it with `dbGetQuery` i.e. `dbGetQuery(hivecon, "SELECT * FROM yourdb.tmptable")` Make sure that it is kerberos compliant – akrun Aug 25 '17 at 22:07
  • I tried this method but still, the data is too big and R runs extremely slow on that... – Ninjia123 Aug 25 '17 at 23:37

0 Answers0