0

I'm running MapReduce jobs using oozie. From workflow i'm just invoking MapReduce driver class and nothing other than that. But for this oozie workflow takes lot of memory. It needs minimum of 2GB container size to invoke the driver class. Below is workflow.xml

<?xml version="1.0" encoding="utf-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="My Job">
<start to="start-job" />
<action name='start-job'>
    <shell xmlns="uri:oozie:shell-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${jobQueue}</value>
            </property>
        </configuration>
        <exec>${jobScript}</exec>
        <argument>${arguments}</argument>
        <argument>${queueName}</argument>
        <argument>${wf:id()}</argument>
        <file>myPath/MyDriver.sh#MyDriver.sh</file>
    </shell>
    <ok to="end" />
    <error to="kill" />
</action>
<kill name="kill">
    <message>Job failed
        failed:[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />

My shell script will look like below(MyDriver.sh),

hadoop jar myJar.jar MyDriverClass $1 $2 $3

Why oozie takes so much memory. How to reduce memory consumption of oozie?

Vijayakumar
  • 303
  • 4
  • 10

1 Answers1

1

Shell action will start at least 2 mappers to run your java class.

You can avoid this using java action. Put your jar inside ${workflow-path}/lib/ directory and change your workflow:

<action name='start-job'>
    <java>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
                <name>mapred.job.queue.name</name>
                <value>${jobQueue}</value>
            </property>
        </configuration>
        <main-class>MyDriverClass</main-class>

        <arg>${arguments}</arg>
        <arg>${queueName}</arg>
        <arg>${wf:id()}</arg>
    </java>
    <ok to="end" />
    <error to="kill" />
</action>
Ruslan Ostafiichuk
  • 4,422
  • 6
  • 30
  • 35
  • Also, if you insist on using Oozie to launch a Shell to launch a MapReduce job *(why not, maybe it's good for karma)* then you can try to reduce the RAM usage of the initial Shell by setting in action `` the properties `oozie.launcher.mapreduce.map.memory.mb` and `oozie.launcher.yarn.app.mapreduce.am.resource.mb` to, say, 512 MB. – Samson Scharfrichter Feb 01 '16 at 19:17
  • Ruslan, I can't do that. Because i have many related jobs. I built a single fat jar and shared the same in commonpath. Classes will be loaded from the jar related to particular job triggered. – Vijayakumar Feb 02 '16 at 05:08
  • @Vijay, you can specify "oozie.libpath" in your job.properties and put single jar into that path – Ruslan Ostafiichuk Feb 02 '16 at 16:31
  • @Ruslan one more thing i have noticed when running the job with java action. It is launched as local job and not as MR. Before it was launched as MR now It is not. Why is that? – Vijayakumar Feb 03 '16 at 07:25
  • @Vijay I also used java action to start MR jobs and have no issues with this. Maybe, I will help you if you add more details. Alternatively you can write an mapreduce action directly in oozie workflow https://cwiki.apache.org/confluence/display/OOZIE/Map+Reduce+Cookbook – Ruslan Ostafiichuk Feb 03 '16 at 08:46