How to connect Spark-Notebook to Hive metastore?

Question

This is a cluster with Hadoop 2.5.0, Spark 1.2.0, Scala 2.10, provided by CDH 5.3.2. I used a compiled spark-notebook distro

It seems Spark-Notebook cannot find the Hive metastore by default.

How to specify the location of hive-site.xml for spark-notebook so that it can load the Hive metastore?

Here is what I tried:

link all files from /etc/hive/conf, with hive-site.xml included, to the current directory
specify SPARK_CONF_DIR variable in bash

Have you already hive metastore services started? – user1314742 May 18 '16 at 10:34 — user1314742, May 18 '16 at 10:34

LucaGuerra · Answer 1 · 2016-02-01T15:11:01.667

When you start the notebook set the environment variable EXTRA_CLASSPATH with the path where you have located the hive-site.xml, this works for me:EXTRA_CLASSPATH=/path_of_my_mysql_connector/mysql-connector-java.jar:/my_hive_site.xml_directory/conf ./bin/spark-notebook I have also passed the jar of my mysqlconnector because I have Hive with MySql.

I have found some info from this link: https://github.com/andypetrella/spark-notebook/issues/351

score 0 · Answer 2 · answered May 17 '16 at 08:50

Using CDH 5.5.0 Quickstart VM, the solution is the following: You need the reference hive-site.xmlto the notebook which provides the access information to the hive metastore. By default, spark-notebooks uses an internal metastore.

You can the define the following environmental variable in ~/.bash_profile:

HADOOP_CONF_DIR=$HADOOP_CONF_DIR:/etc/hive/conf.cloudera.hive/
export HADOOP_CON_DIR

(Make sure you execute source ~/.bash_profile if you do not open a new terminal the terminal)

(The solution is given here: https://github.com/andypetrella/spark-notebook/issues/351)

How to connect Spark-Notebook to Hive metastore?

2 Answers2