1

I'm using RJDBC 0.2-5 to connect to Hive in Rstudio. My server has hadoop-2.4.1 and hive-0.14. I follow the below mention steps to connect to Hive.

library(DBI)
library(rJava)
library(RJDBC)
.jinit(parameters="-DrJava.debug=true")
drv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver", 
            c("/home/packages/hive/New folder3/commons-logging-1.1.3.jar",
              "/home/packages/hive/New folder3/hive-jdbc-0.14.0.jar",
              "/home/packages/hive/New folder3/hive-metastore-0.14.0.jar",
              "/home/packages/hive/New folder3/hive-service-0.14.0.jar",
              "/home/packages/hive/New folder3/libfb303-0.9.0.jar",
              "/home/packages/hive/New folder3/libthrift-0.9.0.jar",
              "/home/packages/hive/New folder3/log4j-1.2.16.jar",
              "/home/packages/hive/New folder3/slf4j-api-1.7.5.jar",
              "/home/packages/hive/New folder3/slf4j-log4j12-1.7.5.jar",
              "/home/packages/hive/New folder3/hive-common-0.14.0.jar",
            "/home/packages/hive/New folder3/hadoop-core-0.20.2.jar",
            "/home/packages/hive/New folder3/hive-serde-0.14.0.jar",
             "/home/packages/hive/New folder3/hadoop-common-2.4.1.jar"),
            identifier.quote="`")

conHive <- dbConnect(drv, "jdbc:hive://myserver:10000/default",
                  "usr",
                  "pwd")

But I am always getting the following error:

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars

Even I tried with different version of Hive jar, Hive-jdbc-standalone.jar but nothing seems to work.. I also use RHive to connect to Hive but there was also no success.

Can anyone help me?.. I kind of stuck :(

user2538041
  • 61
  • 1
  • 7

2 Answers2

7

I didn't try rHive because it seems to need a complex installation on all the nodes of the cluster.

I successfully connect to Hive using RJDBC, here are a code snipet that works on my Hadoop 2.6 CDH5.4 cluster :

#loading libraries
library("DBI")
library("rJava")
library("RJDBC")

#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)

#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")

#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases

The harder is to find all the needs jars and where to find them ...

UPDATE The hive standalone JAR contains all that was needed to use Hive, using this standalone JAR with the hadoop-common jar is enough to use Hive.

So this is a simplified version, no need to worry to other jars that the hadoop-common and the hive-standalone jars.

 #loading libraries
 library("DBI")
 library("rJava")
 library("RJDBC")

 #init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
 cp = c("/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
 .jinit(classpath=cp)

 #initialisation de la connexion
 drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc-standalone.jar", identifier.quote="`")
 conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")

 #working with the connexion
 show_databases <- dbGetQuery(conn, "show databases")
 show_databases
loicmathieu
  • 5,181
  • 26
  • 31
  • Thanks loicmathieu for your reply.. I have taken all the jars from the lib directory of Hive & hadoop that are installed on my server. Also Two thing is different here, one is I am trying to connect to remote server rather than local. second is Hive2 is not configured on my server & I had to use hive in the Conn parameter( "jdbc:hive://myserver:10000/....) Can you suggest further ? – user2538041 Oct 09 '15 at 07:07
  • Hi, I also connect to a remote Hive (I use localhost as example) but via Hive2 server. You can try the same way as I but with the org.apache.hadoop.hive.jdbc.HiveDriver driver and the jdbc URl you use. The main part here is to define a correct classpath and calling .jinit with it (in your example you don't use the same initialisation code as myself). So I suggest you to use the same way to init rJava that in my example (defining cp then callin .jinit()), I just add one by one all the needed jars util I have no more NoClassDefFoundError ... – loicmathieu Oct 09 '15 at 09:11
  • I also follow the same that you mentioned but again stuck with the same error. However when I checked the Hive-common-0.14.jar, HiveConf class file is present with appropriate package. not sure why JVM is not considering this. Even I checked the my JVM version with the JVM version on which this jar file is created. Mine is newer than Jar's JVM So there should not be any issue :( – user2538041 Oct 09 '15 at 12:14
  • The order of these statements mattered for me. `library("DBI") library("rJava") library("RJDBC") drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/hive-jdbc.jar", identifier.quote="\`") .jclassLoader()$setDebug(1L) for(l in list.files('')){ .jaddClassPath(paste("", l , sep=""))} conn <- dbConnect(drv, "jdbc:hive2://:10000/", "username", "password") show_databases <- dbGetQuery(conn, "show databases") show_databases` – Sandeep Jun 14 '17 at 23:31
  • Also, here's the list of jars that I used `hadoop-common.jar hive-service.jar libthrift-0.9.3.jar slf4j-log4j12.jar hive-jdbc-standalone.jar httpclient-4.2.5.jar log4j-1.2.17.jar hive-jdbc.jar httpcore-4.2.5.jar slf4j-api.jar` – Sandeep Jun 14 '17 at 23:36
  • It was not working for me also when connected from remote. I simply copied the driver files to the R host and replaced the above paths. And it worked !!!! The files i coped : hive-jdbc-1.2.0-mapr-1707-standalone.jar, hadoop-common-2.7.0-mapr-1703.jar – sjd Mar 15 '18 at 07:42
  • I also follow the code and for CDH 5.9, the jars are hadoop-common-2.6.0-cdh5.9.1.jar and hive-jdbc-1.1.1-standalone.jar – Carlos Gomez Apr 08 '19 at 21:14
0

Ioicmathieu's answer works for me now after I have switched to an older hive jar for example from 3.1.1 to 2.0.0.

Unfortunately I can't comment on his answer that's why I have written another one.

If you run into the following error try an older version:

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://host_name: Could not establish connection to jdbc:hive2://host_name:10000: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=default})

Stipe
  • 71
  • 4
  • I'm not asking a question, I'm giving a hint for an error people might encounter when they use the first answer and point to a possible solution as well. – Stipe Jul 02 '19 at 12:03