I am exploring the capabilities of Oozie for managing Hadoop workflows. I am trying to set up a shell action which invokes some hive commands. My shell script hive.sh looks like:
#!/bin/bash
hive -f hivescript
Where the hive script (which has been tested independently) creates some tables and so on. My question is where to keep the hivescript and then how to reference it from the shell script.
I've tried two ways, first using a local path, like hive -f /local/path/to/file
, and using a relative path like above, hive -f hivescript
, in which case I keep my hivescript in the oozie app path directory (same as hive.sh and workflow.xml) and set it to go to the distributed cache via the workflow.xml.
With both methods I get the error message:
"Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]"
on the oozie web console. Additionally I've tried using hdfs paths in shell scripts and this does not work as far as I know.
My job.properties file:
nameNode=hdfs://sandbox:8020
jobTracker=hdfs://sandbox:50300
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true
oozieProjectRoot=${nameNode}/user/sandbox/poc1
appPath=${oozieProjectRoot}/testwf
oozie.wf.application.path=${appPath}
And workflow.xml:
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${appPath}/hive.sh</exec>
<file>${appPath}/hive.sh</file>
<file>${appPath}/hive_pill</file>
</shell>
<ok to="end"/>
<error to="end"/>
</action>
<end name="end"/>
My objective is to use oozie to call a hive script through a shell script, please give your suggestions.