Questions tagged [rjdbc]

RJDBC - Allows the use of JDBC to access databases from R

About

RJDBC is an package providing a database interfaces using . This allows the use of any in R through the JDBC interface. The only requirement is working and a JDBC driver for the database engine to be accessed. RJDBC uses the interface which is also used by the packages , , , and .

Example

library(RJDBC)

drv <- JDBC("com.mysql.jdbc.Driver",
           "/etc/jdbc/mysql-connector-java-3.1.14-bin.jar",
           identifier.quote="`")

conn <- dbConnect(drv, "jdbc:mysql://localhost/test", "user", "pwd")

Loads a JDBC driver for MySQL (adjust the path to the driver's JAR file as necessary) and connects to the local database "test". The connection handle conn is used for all subsequent operations.

(Note: Windows users can use drv<-JDBC("sun.jdbc.odbc.JdbcOdbcDriver") to use Sun's JDBC to ODBC Bridge with "jdbc:odbc:..." URL.)

In order to allow more complex names for tables and columns, it is a good idea to set identifier.quote to the quote character supported by the database for quoting identifiers. The default (NA) does not quote identifiers, but this limits the names that can be used, affecting dbReadTable and dbWriteTable.

dbListTables(conn)
data(iris)
dbWriteTable(conn, "iris", iris, overwrite=TRUE)
dbGetQuery(conn, "select count(*) from iris")
d <- dbReadTable(conn, "iris")

RJDBC supports prepared statements and argument substitution, so it is possible to run queries like:

dbGetQuery(conn, "select count(*) from iris where Species=?", "setosa")

Note that the life time of a connection, result set, driver etc. is determined by the lifetime of the corresponding R object. Once the R handle goes out of scope (or if removed explicitly by rm) and is garbage-collected in R, the corresponding connection or result set is closed and released. This is important for databases that have limited resources (like Oracle) - you may need to add gc() by hand to force garbage collection if there could be many open objects. The only exception are drivers which stay registered in the JDBC even after the corresponding R object is released as there is currently no way to unload a JDBC driver (in RJDBC).

Type Handling

Type-handling is a rather complex issue, especially with JDBC as different databases support different data types. RJDBC attempts to simplify this issue by internally converting all data types to either character or numeric values. When retrieving results, all known numeric types are converted to R's numeric representation and all other types are treated as characters. When assigning parameters in parametrized queries, numeric, integer and character are the types used. Convenience methods like dbReadTable and dbWriteTable can only use the most basic SQL types, because they don't know what DBMS will be used. Therefore dbWriteTable uses only INTEGER, DOUBLE PRECISION or VARCHAR(255) to create the table. For all other types you'll have to use DBML statements directly.

License

RJDBC is released under v2.

Repositories

Other resources

Related tags

200 questions
0
votes
1 answer

RODBC posixct date field makes query very slow

I need help with RODBC sqlQuery. I run a SQL script that produces 11mill rows of data. It takes 78 secs to pull data using RODBC but unfortunately when I include a date/time field, it takes 180 secs to pull data from RODBC and only 78 secs in…
charliealpha
  • 307
  • 2
  • 12
0
votes
0 answers

Supressing INFO messages in R

I want to supress INFO messages on execution of some statement. Any help? connection = IIP.JDBCConnection() 17/08/30 09:22:01 INFO jdbc.Utils: Supplied authorities: 10.177.119.89:10000 17/08/30 09:22:01 INFO jdbc.Utils: Resolved authority:…
Ujjwal Gupta
  • 13
  • 1
  • 5
0
votes
0 answers

How to write a table reference working with rjdbc?

I am working with really big data with R. My data is on Hive and I am using rjdbc. I am thinking of using a reference table on R because its impossible to load the table onto R even just using 10% sample. I am using the tbl function from…
Ninjia123
  • 47
  • 6
0
votes
1 answer

RJDBC - Could not open client transport with JDBC Uri

I'm trying to connect to my Biginsights Enterprise cluster from RJDBC: hiveconnection <- dbConnect(drv, "jdbc:hive2://xxxxx:10001/default", "xxxxx", "xxxxx", ssl="true", sslTrustStore="mytruststore.jks", …
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
0
votes
1 answer

Define column class when data is impoted using RJDBC in R

I am trying to import a very large data set from a HANA database in R. One of the problems of the RJDBC package is that all columns with characters are loaded as character column type. In our case loading the column as a factor would be much more…
Tobias Dekker
  • 980
  • 8
  • 19
0
votes
1 answer

Error in connecting R to Redshift

I'm trying to connect RStudio to Amazon Redshift via JDBC and this is what I tried to run: driver <- JDBC("com.amazon.redshift.jdbc42.Driver", "~/Downloads/RedshiftJDBC42-1.2.1.1001.jar", identifier.quote="`") # url <-…
charmander
  • 957
  • 1
  • 8
  • 21
0
votes
1 answer

Merge Hive tables using RJDBC package

I am using RJDBC package to connect to Hive. library(rJava) library(RJDBC) - - - conn <- dbConnect(drv, "jdbc:hive2://ip:port","***", "****") After getting connected I type this command to see the list of tables: dbListTables(conn) [1] "m_11" …
ROY
  • 268
  • 2
  • 11
0
votes
1 answer

Putting dbSendQuery into a function in R

I'm using RJDBC in RStudio to pull a set of data from an Oracle database into R. After loading the RJDBC package I have the following lines: drv = JDBC("oracle.jdbc.OracleDriver", classPath="C:/R/ojdbc7.jar", identifier.quote = " ") conn =…
Qaribbean
  • 178
  • 2
  • 3
  • 17
0
votes
1 answer

Applying if logic in SQL query?

my Database table is in the format as shown below : Current Year = 2017 ID Bought Year 1 A 2016 1 A 2015 2 A 2013 2 B 2015 2 B 2014 3 A 2014 4 A 2014…
Learner_seeker
  • 544
  • 1
  • 4
  • 21
0
votes
3 answers

Include a hashtag in dbGetQuery()

I'm trying to use RJDBC to connect to a SAP HANA database and query for a temporary table, which is stored with a #-prefix: test <- dbGetQuery(jdbcConnection, "SELECT * FROM #CONTROL_TBL") # Error in [...]: invalid table name: Could not find…
laterstat
  • 41
  • 1
  • 5
0
votes
0 answers

RJDBC Parallel query in R

I have a code header that looks something like this in SQL Developer: ALTER SESSION ENABLE PARALLEL QUERY SELECT /*+PARALLEL(dfam 32)*/ distinct dfam.a,dfam.b from d_f_a_market dfam; I am trying to use parallel query in R using RJDBC. I…
Aizen
  • 561
  • 1
  • 9
  • 19
0
votes
1 answer

Unable to establish Cassandra connection in R using RJDBC

casscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9042/quantum_cassandra") 12:31:02.140 [main] DEBUG c.datastax.driver.jdbc.SessionHolder - Final Properties to Connection: {user=, password=, portNumber=9042,…
0
votes
1 answer

ROracle, ORA-30204: buffer is not large enougth

When using ROracle library to connect Oracle to R, following error occurred (for "select" queries): ORA-30204: buffer is not large enougth. When using RJDBC library everything is OK, but very slow. How can we avoid this error?
0
votes
1 answer

ORA-01033 error with RJDBC package from R

I use the RJDBC package to connect to the data bases at my working place. Until recently, the following code has worked properly: connection1<-dbConnect(driver, address1, username1, password1) connection2<-dbConnect(driver, address2, username2,…
Vuk
  • 1
0
votes
3 answers

Teradata JDBC Error 1338 after certain number of batch executions

I am trying to insert 3.8M records into a Teradata table containing 14 columns, through a batch of 10000 records using RJDBC package of R. It always crashes after inserting 380000 records i.e. at 39th batch execution. This is the error I get, Error…
SanjayIV
  • 748
  • 5
  • 6