2

This is a cluster with Hadoop 2.5.0, Spark 1.2.0, Scala 2.10, provided by CDH 5.3.2. I used a compiled spark-notebook distro

It seems Spark-Notebook cannot find the Hive metastore by default.

How to specify the location of hive-site.xml for spark-notebook so that it can load the Hive metastore?

Here is what I tried:

  1. link all files from /etc/hive/conf, with hive-site.xml included, to the current directory

  2. specify SPARK_CONF_DIR variable in bash

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Rex
  • 2,097
  • 5
  • 16
  • 18

2 Answers2

1

When you start the notebook set the environment variable EXTRA_CLASSPATH with the path where you have located the hive-site.xml, this works for me:EXTRA_CLASSPATH=/path_of_my_mysql_connector/mysql-connector-java.jar:/my_hive_site.xml_directory/conf ./bin/spark-notebook I have also passed the jar of my mysqlconnector because I have Hive with MySql.

I have found some info from this link: https://github.com/andypetrella/spark-notebook/issues/351

LucaGuerra
  • 319
  • 2
  • 11
0

Using CDH 5.5.0 Quickstart VM, the solution is the following: You need the reference hive-site.xmlto the notebook which provides the access information to the hive metastore. By default, spark-notebooks uses an internal metastore.

You can the define the following environmental variable in ~/.bash_profile:

HADOOP_CONF_DIR=$HADOOP_CONF_DIR:/etc/hive/conf.cloudera.hive/
export HADOOP_CON_DIR

(Make sure you execute source ~/.bash_profile if you do not open a new terminal the terminal)

(The solution is given here: https://github.com/andypetrella/spark-notebook/issues/351)

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145