3

I am trying to connect with the HiveServer2 with the use of dplyr.spark.hive package but I am occuring an error that I can not pass a username to the dbConnect function and probably this is the reason I receive an error about NULL client_protocol.

Does anyone know how to fix this or how to pass user/username to the dbConnect function, where the driver is a JDBC?

This beeline request works fine for me

beeline  -u "jdbc:hive2://host:port/dbname;auth=noSasl" -n mkosinski --outputformat=tsv --incremental=true -f sql_statement.sql > sql_output

but this R equivalent does not:

> library(dplyr.spark.hive)
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’

Attaching package: ‘dplyr.spark.hive’

The following object is masked from ‘package:SparkR’:

    cache

Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’ 
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’ 
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> host = 'tools-1.hadoop.srv'
> port = 10000
> driverclass = "org.apache.hive.jdbc.HiveDriver"
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> library(RJDBC)
> dr = JDBC(driverclass, Sys.getenv("HADOOP_JAR"))
> #url = paste0("jdbc:hive2://", host, ":", port)
> url = paste0("jdbc:hive2://", host, ":", port,"/loghost;auth=noSasl")
> class = "Hive"
> con.class = paste0(class, "Connection") # class = "Hive"
> con = new(con.class, dbConnect(dr, url, username = "mkosinski", database = "loghost"))
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1],  : 
  java.sql.SQLException: Could not establish connection to jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=loghost})
> con = new(con.class, dbConnect(dr, url, username = "mkosinski"))
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1],  : 
  java.sql.SQLException: Could not establish connection to jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=loghost})

EDIT 1

I've tried to use different .jar for connection (as suggested in comments) and it looks like the previous problem is solved (I probably used wrong .jar) but now I receive an error telling me that the connection is somehow not configured:

> Sys.setenv(HADOOP_HOME="/usr/share/hadoop/share/hadoop/common/")
> Sys.setenv(HIVE_HOME = '/opt/hive/lib/')
> host = 'tools-1.hadoop.srv'
> port = 10000
> driverclass = "org.apache.hive.jdbc.HiveDriver"
> library(RJDBC)
Loading required package: DBI
Loading required package: rJava
> dr = JDBC(driverclass,classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar"))
> dr2 = JDBC(driverclass,classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
+                                      "/opt/hive/lib/commons-configuration-1.6.jar"))
> url = paste0("jdbc:hive2://", host, ":", port)
> dbConnect(dr, url, username = "mkosinski", database = "loghost") -> cont
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1],  : 
  java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
> dbConnect(dr2, url, username = "mkosinski", database = "loghost") -> cont
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1],  : 
  java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                  LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8       LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8      LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RJDBC_0.2-5 rJava_0.9-7 DBI_0.3.1  

loaded via a namespace (and not attached):
[1] tools_3.1.3
wibeasley
  • 5,000
  • 3
  • 34
  • 62
Marcin
  • 7,834
  • 8
  • 52
  • 99
  • My 2 cents: that error message smells like a client-side config issue for Hadoop / Hive JARs, more than anything else (cf. http://stackoverflow.com/questions/30931599/error-jdbc-hiveconnection-error-opening-session-hive for instance). Which version of HiveServer2 are you connecting to? Which version of the JDBC driver is actually picked from the *CLASSPATH* of the R job? – Samson Scharfrichter Jan 02 '16 at 20:59
  • @SamsonScharfrichter thanks for opening my eyes. I'll check whether all versions are compatible of hive-client, hive-jdbc and hiveserver2 – Marcin Jan 03 '16 at 17:56
  • @SamsonScharfrichter I have checked that my hive version is 1.0.0 and my hive-jdbc.jar is also in version 1.0.0 - I have no idea on how to check the version of HiveServer2. Moreover I have no idea what is CLASSPATH of the R job – Marcin Jan 11 '16 at 16:49
  • Ok I understand that `classpath` is a vector of paths to jars that are needed to connect with hiveserver2 when one uses JDBC. And probably I should add morde jars as it's listed here https://streever.atlassian.net/wiki/pages/viewpage.action?pageId=4390924 – Marcin Jan 11 '16 at 18:06
  • With Hive 1.0 it should boil down to `hive-jdbc-standalone` (the large one ~ 18 MB) plus, for Kerberos clusters only, `hadoop-common` / `hadoop-auth` / `commons-configuration` plus optionally, just to shut down warnings, `slf4j-log4j12.jar` / `log4j.jar` – Samson Scharfrichter Jan 11 '16 at 19:23
  • Wow. I'll try this tomorrow. Thanks for comment – Marcin Jan 11 '16 at 19:42
  • Thanks @SamsonScharfrichter for suggestion. It looks like `hive-jdbc-standalone` is the proper jar, but now I receive java error about some configuration. Could you be so kind and have a look at my EDIT 1? – Marcin Jan 12 '16 at 10:30
  • Damn - `org.apache.hadoop.conf.Configuration` is in "hadoop-common"; then you have to add "commons-configuration" in the mix – Samson Scharfrichter Jan 12 '16 at 11:17
  • Thanks I think you can post an answer that I had specified wrong jar and that this communication required more than one jar :) – Marcin Jan 12 '16 at 13:26

1 Answers1

0

The problem was the wrong .jar specification (classPath arg in JDBC) and wrong hiveServer2 url

The explanation is here https://stackoverflow.com/a/34792408/3857701

Community
  • 1
  • 1
Marcin
  • 7,834
  • 8
  • 52
  • 99