2

Trying to run a Oozie coordinator with a java action workflow that consists of running a Camus mapper job. The coordinator seems to run, and start the workflow every 20 minutes, but the workflow would just run indefinitely, even though the job when run independently would easily complete in a few minutes. I think the error either has to do with how I run the job, or how the arguments are passed, but I'm not sure how to debug this. Here is the code:

/coord/job.properties

oozie.coord.application.path=hdfs://10.0.2.15:8020/user/hue/app/coord/coordinator.xml 
name=camus 
frequency=20 
start=2015-07-30T11:40Z 
end=2016-07-30T11:40Z 
timezone=GMT+0530
workflow=hdfs://10.0.2.15:8020/user/hue/app/workflow/workflow.xml

nameNode=hdfs://10.0.2.15:8020 
jobTracker=10.0.2.15:8021
queueName=default
properties=${nameNode}/user/hue/app/workflows/lib/config.properties

coord/coordinator.xml

<coordinator-app name="${name}" frequency="${frequency}" start="${start}" end="${end}" timezone="${timezone}" xmlns="uri:oozie:coordinator:0.1">
   <action>
      <workflow>
         <app-path>${workflow}</app-path>
      </workflow>
   </action>     
</coordinator-app>

/workflow/workflow.xml

<workflow-app xmlns='uri:oozie:workflow:0.4' name='camus-wf'>
    <start to='camus_job' />
    <action name='camus_job'>
        <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <main-class>com.linkedin.camus.etl.kafka.CamusJob</main-class>
            <arg>-P</arg>
            <arg>${properties}</arg>
        </java>
        <ok to="end" />
        <error to="fail" />
    </action>
    <kill name="fail">
        <message>Camus Job Failed</message>
    </kill>
    <end name='end' />
</workflow-app>

The SHADED jar and config.properties are located in /workflow/lib/

I'm running HDP 2.2

Coordinator Logs:

2015-08-03 06:43:43,820  INFO CoordSubmitXCommand:543 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[camus] JOB[0000000-150803063131195-oozie-oozi-C] ACTION[-] ENDED Coordinator Submit jobId=0000000-150803063131195-oozie-oozi-C
2015-08-03 06:43:43,935  INFO CoordMaterializeTransitionXCommand:543 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[camus] JOB[0000000-150803063131195-oozie-oozi-C] ACTION[-] materialize actions for tz=Coordinated Universal Time,
 start=Thu Jul 30 11:40:00 UTC 2015, end=Thu Jul 30 15:40:00 UTC 2015,
 timeUnit 12,
 frequency :20:MINUTE,
 lastActionNumber 0
2015-08-03 06:43:43,971  INFO CoordMaterializeTransitionXCommand:543 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[camus] JOB[0000000-150803063131195-oozie-oozi-C] ACTION[-] [0000000-150803063131195-oozie-oozi-C]: Update status from PREP to RUNNING
2015-08-03 06:43:44,113  INFO CoordActionInputCheckXCommand:543 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-150803063131195-oozie-oozi-C] ACTION[0000000-150803063131195-oozie-oozi-C@1] [0000000-150803063131195-oozie-oozi-C@1]::CoordActionInputCheck:: Missing deps: 
2015-08-03 06:43:44,209  INFO CoordActionNotificationXCommand:543 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-150803063131195-oozie-oozi-C] ACTION[0000000-150803063131195-oozie-oozi-C@1] STARTED Coordinator Notification actionId=0000000-150803063131195-oozie-oozi-C@1 : WAITING
...
2015-08-03 06:43:44,267  INFO CoordActionNotificationXCommand:543 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-150803063131195-oozie-oozi-C] ACTION[0000000-150803063131195-oozie-oozi-C@12] No Notification URL is defined. Therefore nothing to notify for job 0000000-150803063131195-oozie-oozi-C action ID 0000000-150803063131195-oozie-oozi-C@12
2015-08-03 06:43:44,268  INFO CoordActionNotificationXCommand:543 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-150803063131195-oozie-oozi-C] ACTION[0000000-150803063131195-oozie-oozi-C@12] ENDED Coordinator Notification actionId=0000000-150803063131195-oozie-oozi-C@12
2015-08-03 06:43:44,433  WARN ParameterVerifier:546 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-150803063131195-oozie-oozi-C] ACTION[0000000-150803063131195-oozie-oozi-C@1] The application does not define formal parameters in its XML definition
...

Workflow Logs:

2015-08-03 06:43:44,672  INFO ActionStartXCommand:543 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[camus-wf] JOB[0000001-150803063131195-oozie-oozi-W] ACTION[0000001-150803063131195-oozie-oozi-W@:start:] Start action [0000001-150803063131195-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2015-08-03 06:43:44,673  INFO ActionStartXCommand:543 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[camus-wf] JOB[0000001-150803063131195-oozie-oozi-W] ACTION[0000001-150803063131195-oozie-oozi-W@:start:] [***0000001-150803063131195-oozie-oozi-W@:start:***]Action status=DONE
2015-08-03 06:43:44,673  INFO ActionStartXCommand:543 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[camus-wf] JOB[0000001-150803063131195-oozie-oozi-W] ACTION[0000001-150803063131195-oozie-oozi-W@:start:] [***0000001-150803063131195-oozie-oozi-W@:start:***]Action updated in DB!
2015-08-03 06:43:45,104  INFO ActionStartXCommand:543 - SERVER[sandbox.hortonworks.com] USER[root] GROUP[-] TOKEN[] APP[camus-wf] JOB[0000001-150803063131195-oozie-oozi-W] ACTION[0000001-150803063131195-oozie-oozi-W@camus_job] Start action [0000001-150803063131195-oozie-oozi-W@camus_job] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
Mogsdad
  • 44,709
  • 21
  • 151
  • 275
Jake Chase
  • 27
  • 1
  • 5
  • What do the logs say for all services involved? – cjackson Aug 03 '15 at 03:42
  • @cjackson added the logs. Also on the web console, the Coordinator shows 48 concurrent actions, with the 1st one RUNNING, and the other ones READY, while the workflow job has its start action end with status OK, but is stuck on its camus_job with status PREP – Jake Chase Aug 03 '15 at 07:01
  • meet the same issue today, can not get a solution – Jacky Dec 25 '15 at 02:18

0 Answers0