0

I deployed an hadoop cluster with bdutil on google compute engine.
My configuration
- OS: Ubuntu 14
- Spark: 1.5
- Hive: 0.12
- 1 master node and 2 workers

Hive metastore configuration: enter image description here

I copied the hive-site.xml from hive to $SPARK_HOME/conf/hive-site.xml (only on master node)

When I tried to use HiveContext in Pyspark shell, I get this message error:

enter image description here

...

enter image description here

Does someone know what is wrong?

Thank you in advance

Raouf
  • 989
  • 2
  • 11
  • 15
  • 2
    "You must build Spark with Hive"... So how did you install Spark? – OneCricketeer Aug 09 '16 at 13:58
  • I use bdutil (https://cloud.google.com/hadoop/bdutil) which allow automatic installation of hive and spark. What do you mean by "You must build Spark with Hive" (I'm beginner with spark)? – Raouf Aug 09 '16 at 14:09
  • I see... Are you aware that each of those component versions is outdated? – OneCricketeer Aug 09 '16 at 14:11
  • That message. It's in the error output – OneCricketeer Aug 09 '16 at 14:12
  • Sorry, the hive version is 1.2. So you mean that I can't use those component since they are outdated? – Raouf Aug 09 '16 at 14:16
  • I'm sure you can use them, otherwise that tool wouldn't be published. I'm just saying you aren't going to get the benefits of the latest releases, if that matters to you. I've never used `bdutil`, but it appears to be compiling Spark without Hive support. – OneCricketeer Aug 09 '16 at 14:43
  • Another funny detail that nobody has pointed out: Spark implicitly states that it **did not find your `hive-site.xml`** as it reverts to the hard-coded default properties, i.e. an embedded (and empty) Derby database instead of your shared MySQL database. – Samson Scharfrichter Aug 09 '16 at 16:45
  • Hive configuration is managed by the Hadoop libraries, and these libs scan the CLASSPATH for directories where the XML files may be present *(just like Log4J does for its properties file)*. They don't give a sh!t about $SPARK_CONF_DIR. So, either copy/link `hive-site.xml` in the same dir as the core Hadoop conf files -- i.e. $HADOOP_CONF_DIR -- or add sthg like `/etc/hive/conf` to your `spark.driver.extraClassPath` property. – Samson Scharfrichter Aug 09 '16 at 16:49
  • Thanks @SamsonScharfrichter, to do the second solution you show me, I add this line `spark.driver.extraClassPath $HIVE_HOME/conf/` in **$SPARK_HOME/conf/spark-defaults.conf** file. Is it what you suggest me? – Raouf Aug 09 '16 at 20:15
  • Yep. *$$ filler because of silly 15-character rule $$* – Samson Scharfrichter Aug 09 '16 at 21:48
  • Hi @SamsonScharfrichter, it does not work. Do you what would be the issue? – Raouf Aug 10 '16 at 07:36
  • Hmm... on second thoughts, when you talked about using the `HIVE_HOME` *shell* env variable into a *scala* property file, you mean you **literally** wrote `$HIVE_HOME/conf/`??? Write the real path instead, just to be sure. – Samson Scharfrichter Aug 10 '16 at 08:37
  • I write the real path. I used $HIVE_HOME in my post to show you the location of my file. – Raouf Aug 10 '16 at 09:35

0 Answers0