2

I am trying to set up a spark action workflow within apache oozie though I'm getting the following error when select * from db.table is called within my spark code in a hive context:

org.apache.spark.sql.AnalysisException: Table not found: `db`.`table`; line 1 pos 34

This spark job works with spark-submit so I can't seem to nail down the issue. I've added hive-site.xml to various locations recommended in previous questions such as the workspace lib directory and the workspace directory and added it to the job.xml setting though I still get the same issue.

I'm running in deploy mode cluster and master yarn.

I've tried many combinations and not sure what else to do.

Where am I going wrong?

user6666914
  • 31
  • 1
  • 6

1 Answers1

0

It is necessary to add the Hive configuration. For example, adding in the action of the workflow de file where it is.

<spark xmlns="uri:oozie:spark-action:1.0">
   <!-- ... ->
   <file>${hiveConfig}</file>
</spark>

In job.properties must be the reference:

hiveConfig=/user/oozie/extraconfig/hive-site.xml

This file must be in each node of cluster