3

I am running Oozie 4.0.1 on Elastic Mapreduce using the 3.0.4 AMI (Hadoop 2.2.0). I've built Oozie from source, and everything installs and seems to work correctly, up to the point of scheduling a Hive job. That is, I can connect to the Web Console, submit and kill jobs using the 'oozie' command, etc. BUT... I find that tasks (I've tried "Hive" and "Shell" so far) go into PREP state (according to the Oozie Web-Console) but never actually start.

I've tried both coordinator (cron) jobs and basic workflow jobs, and gotten the same behavior in either case. It gets to the hive task node, or the shell task node, and then hangs.

For the basic workflow task, here's what the job.properties looks like:

nameNode=hdfs://ip-redacted.ec2.internal:9000                                                                                                                                                              
jobTracker=ip-redacted.ec2.internal:9026

queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/shell

and the workflow.xml looks like:

<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
    <start to="shell-node"/>
    <action name="shell-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>echo</exec>
            <argument>my_output=Hello Oozie</argument>
            <capture-output/>
        </shell>
        <ok to="check-output"/>
        <error to="fail"/>
    </action>
    <decision name="check-output">
        <switch>
            <case to="end">
                ${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
            </case>
            <default to="fail-output"/>
        </switch>
    </decision>
    <kill name="fail">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <kill name="fail-output">
        <message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

I don't see any messages in the oozie.log file that look particularly incriminating.

Any thoughts or advice are much appreciated.

mindcrime
  • 657
  • 8
  • 23

2 Answers2

0

When there are not enough free slots in a node, the Oozie scheduler just keeps waiting for free slots. Check this for more details and how to increase the number of slots per node.

From the information provided in the OP, this might or might not be the solution.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
0

Coordinator will be in PREP state when the start time is in the future, read more about coordinator states here.

If you are using coordinator - can you add coordinator xml file ?

Also, it would be helpful if you can paste the logs related to the stuck action

Mzf
  • 5,210
  • 2
  • 24
  • 37
  • This happens even when using a simple Workflow action, with no coordinator. So the problem is something more basic than this. I'm familiar with the "start time in the future" issue, but in this case, the task hangs without even *getting* a startTime, as far as I can tell. – mindcrime Apr 25 '14 at 15:38
  • Having the exact same issue here. I was wondering if you have managed to solve that, and if it so, how. – FrancescoM Mar 09 '16 at 17:35