Running Hive Query in Spark through Oozie 4.1.0.3

Question

Getting table not found exception while running Hive Query in Spark using Oozie version 4.1.0.3, as java action.

Copied hive-site.xml and hive-default.xml from hdfs path

workflow.xml used:

<start to="scala_java"/>
<action name="scala_java">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive-  site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>${nameNode}/user/${wf:user()}/${appRoot}/env/devbox/hive-default.xml</value>
</property>
<property>
<name>pool.name</name>
<value>${etlPoolName}</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${QUEUE_NAME}</value>
</property>
</configuration>
<main-class>org.apache.spark.deploy.SparkSubmit</main-class>
<arg>--master</arg>
<arg>yarn-cluster</arg>
<arg>--class</arg>
<arg>HiveFromSparkExample</arg>
<arg>--deploy-mode</arg>
<arg>cluster</arg>
<arg>--queue</arg>
<arg>testq</arg>
<arg>--num-executors</arg>
<arg>64</arg>
<arg>--executor-cores</arg>
<arg>5</arg>
<arg>--jars</arg>
<arg>datanucleus-api-jdo-3.2.6.jar,datanucleus-core-3.2.10.jar,datanucleus-    rdbms-3.2.9.jar</arg>
<arg>TEST-0.0.2-SNAPSHOT.jar</arg>
<file>TEST-0.0.2-SNAPSHOT.jar</file>
</java>

INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Table not found test_hive_spark_t1)

Exception in thread "Driver" org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found test_hive_spark_t1

    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)

    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)

    at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:79)

    at org.apache.spark.sql.hive.HiveContext$$anon$1.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:255)

    at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)

    at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$$anonfun$lookupRelation$3.apply(Catalog.scala:137)

    at scala.Option.getOrElse(Option.scala:120)

    at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:137)
    at org.apache.spark.sql.hive.HiveContext$$anon$1.lookupRelation(HiveContext.scala:255)

The "default" config files are just for **user information** - they are created at install time, from the hard-coded defaults in the JARs. It's the "site" config files that contain useful information, e.g. how to connect to the Metastore (default for that is *"just start an embedded Derby DB with no data inside"*... might explain the "table not found message!) — Samson Scharfrichter, Oct 13 '15 at 22:42
Thanks for your reply Samson. I have the valid entry in hive-site.xml. The application is running fine via spark submit. But through oozie, i am getting table not found exception. — Venkidusamy K, Oct 15 '15 at 05:11
@VenkidusamyK I am having the same issue. Have you found a solution? — Alex Naspo, Apr 06 '16 at 22:34

score 0 · Answer 1 · answered Oct 15 '15 at 07:05

A. The X-default config files are just for user information; they are created at install time, from the hard-coded defaults in the JARs.

It's the X-site config files that contain useful information, e.g. how to connect to the Metastore (default for that is "just start an embedded Derby DB with no data inside"... might explain the "table not found message!

B. Hadoop components search for X-site config files in the CLASSPATH; and if they don't find them there, they silently fallback to default.

So you must tell Oozie to download them to local CWD via <file> instructions. (Except for an explicit Hive Action that uses another, explicit, convention for its specific hive-site but that's not the case here)

score 0 · Answer 2 · answered May 03 '16 at 17:15

0

hive-default.xml is not needed.
Create a custom hive-site.xml and which has hive.metastore.uris property alone.
Pass the custom hive-site.xml in --files hive-site.xml as spark Arguments.
Remove the job-xml property and oozie-hive-defaults.

answered May 03 '16 at 17:15

Arun Sundar

21
3

Running Hive Query in Spark through Oozie 4.1.0.3

2 Answers2