Spark 1.5 on Ubuntu - HiveContext does not work

Question

I deployed an hadoop cluster with bdutil on google compute engine.
My configuration
- OS: Ubuntu 14
- Spark: 1.5
- Hive: 0.12
- 1 master node and 2 workers

Hive metastore configuration:

I copied the hive-site.xml from hive to $SPARK_HOME/conf/hive-site.xml (only on master node)

When I tried to use HiveContext in Pyspark shell, I get this message error:

...

Does someone know what is wrong?

Thank you in advance

"You must build Spark with Hive"... So how did you install Spark? — OneCricketeer, Aug 09 '16 at 13:58
I use bdutil (https://cloud.google.com/hadoop/bdutil) which allow automatic installation of hive and spark. What do you mean by "You must build Spark with Hive" (I'm beginner with spark)? — Raouf, Aug 09 '16 at 14:09
I see... Are you aware that each of those component versions is outdated? — OneCricketeer, Aug 09 '16 at 14:11
Sorry, the hive version is 1.2. So you mean that I can't use those component since they are outdated? — Raouf, Aug 09 '16 at 14:16
I'm sure you can use them, otherwise that tool wouldn't be published. I'm just saying you aren't going to get the benefits of the latest releases, if that matters to you. I've never used `bdutil`, but it appears to be compiling Spark without Hive support. — OneCricketeer, Aug 09 '16 at 14:43
Another funny detail that nobody has pointed out: Spark implicitly states that it **did not find your `hive-site.xml`** as it reverts to the hard-coded default properties, i.e. an embedded (and empty) Derby database instead of your shared MySQL database. — Samson Scharfrichter, Aug 09 '16 at 16:45
Hive configuration is managed by the Hadoop libraries, and these libs scan the CLASSPATH for directories where the XML files may be present *(just like Log4J does for its properties file)*. They don't give a sh!t about $SPARK_CONF_DIR. So, either copy/link `hive-site.xml` in the same dir as the core Hadoop conf files -- i.e. $HADOOP_CONF_DIR -- or add sthg like `/etc/hive/conf` to your `spark.driver.extraClassPath` property. — Samson Scharfrichter, Aug 09 '16 at 16:49
Thanks @SamsonScharfrichter, to do the second solution you show me, I add this line `spark.driver.extraClassPath $HIVE_HOME/conf/` in **$SPARK_HOME/conf/spark-defaults.conf** file. Is it what you suggest me? — Raouf, Aug 09 '16 at 20:15
Hi @SamsonScharfrichter, it does not work. Do you what would be the issue? — Raouf, Aug 10 '16 at 07:36
Hmm... on second thoughts, when you talked about using the `HIVE_HOME` *shell* env variable into a *scala* property file, you mean you **literally** wrote `$HIVE_HOME/conf/`??? Write the real path instead, just to be sure. — Samson Scharfrichter, Aug 10 '16 at 08:37
I write the real path. I used $HIVE_HOME in my post to show you the location of my file. — Raouf, Aug 10 '16 at 09:35

Spark 1.5 on Ubuntu - HiveContext does not work

0 Answers0