6

I have 2 Sqoops that loads data from HDFS to MySQL. I want to execute them using Oozie. I have seen that Oozie is an XML file. How can I configure it so I can execute those Sqoop? Demonstration with steps will be appreciated?

Two Sqoops are:

1.

sqoop export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar1

2.

sqoop export --connect jdbc:mysql://localhost/hduser --table foo2 -m 1 --export-dir /user/cloudera/bar2

Thanks.

Rio
  • 765
  • 3
  • 17
  • 37

2 Answers2

7

You don't have to execute it via a shell action. There is a separate sqoop action in oozie. Here is what you have to put in your workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.4" name="oozie-wf">
    <start to="sqoop-wf1"/>
    <action name="sqoop-wf1">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <command>export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar1</command>
        </sqoop>
        <ok to="sqoop-wf2"/>
        <error to="fail"/>
    </action> 
    <action name="sqoop-wf2">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <command>export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar2</command>
        </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action> 
    <kill name="fail">
        <message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

Hope this helps..

DMA
  • 1,033
  • 1
  • 11
  • 22
  • Hi, Thank you very much. How do I execute workflow.xml? Where do I put workflow.xml file? Where do I define parameters for "${jobTracker}"? I am new to Oozie and your help will be appreciated. – Rio Apr 02 '14 at 14:08
  • jobTracker and nameNode is parametarized here meaning it should be defined in job.properties file. Seems like you have not gone through the examples. Check the apache oozie website. They have provided good examples. Here is the link. [https://oozie.apache.org/docs/3.3.2/DG_Examples.html](https://oozie.apache.org/docs/3.3.2/DG_Examples.html) – DMA Apr 02 '14 at 16:41
  • Ok thanks. My Hadoop version is: Hadoop 2.0.0-cdh4.3.1. Can I use sqoop action in oozie for this version of Hadoop? – Rio Apr 02 '14 at 18:04
  • thanks for your support. One last thing I would like to ask is, before the sqoop command gets executed (to load data from Hive/HDFS to MySQL), I have to load data to the HDFS, so what I do is: hive -hiveconf DATE='2014-04-01' -f bar.hql and it loads data into the HDFS e.g. /user/cloudera/bar1. Where should I put "hive -hiveconf DATE='2014-04-01' -f bar.hql" statement in Oozie? Thanks – Rio Apr 04 '14 at 14:53
  • You can use Hive Action before Sqooping. Go through this example. [http://oozie.apache.org/docs/3.3.2/DG_HiveActionExtension.html](http://oozie.apache.org/docs/3.3.2/DG_HiveActionExtension.html) – DMA Apr 04 '14 at 15:51
  • Can I use Hive action and Sqoop action together? If yes, could you put it as new answer? Thank you! – Rio Apr 04 '14 at 18:21
  • I have also other questions, if you could kindly answer: http://stackoverflow.com/questions/22920431/questions-about-oozie-sqoop – Rio Apr 07 '14 at 18:39
0

You can use an Oozie shell action for this. Basically you need to create a shell action & provide the commands that you posted in your question as the commands to be executed within the action

Sample Oozie action:

 <action name="SqoopAction">
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>[JOB-TRACKER]</job-tracker>
        <name-node>[NAME-NODE]</name-node>
        <prepare>
           <delete path="[PATH]"/>
           ...
           <mkdir path="[PATH]"/>
           ...
        </prepare>
        <job-xml>[SHELL SETTINGS FILE]</job-xml>
        <configuration>
            <property>
                <name>[PROPERTY-NAME]</name>
                <value>[PROPERTY-VALUE]</value>
            </property>
            ...
        </configuration>
        <exec>[SHELL-COMMAND]</exec>
        <argument>[ARG-VALUE]</argument>
            ...
        <argument>[ARG-VALUE]</argument>
        <env-var>[VAR1=VALUE1]</env-var>
           ...
        <env-var>[VARN=VALUEN]</env-var>
        <file>[FILE-PATH]</file>
        ...
        <archive>[FILE-PATH]</archive>
        ...
        <capture-output/>
    </shell>

In your case, you would replace [SHELL-COMMAND] with whatever Sqoop command you want to run, such as:

<exec>sqoop export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar1</exec>

Also, you could put all your sqoop commands in a shell script, and execute that script instead. This is better if you have a lot of commands to be executed.

Chaos
  • 11,213
  • 14
  • 42
  • 69
  • Hi, Thanks for the reply. Where do I put this XML file (say SqoopAction.xml)? How do I execute this XML file? What do I put for [PATH], [SHELL SETTINGS FILE], [FILE-PATH], ...? I am using "Cloudera QuickStart VM" – Rio Apr 01 '14 at 22:12
  • Please read through this guide: https://blogs.oracle.com/datawarehousing/entry/building_simple_workflows_in_oozie to learn about how to execute oozie workflows through workflow.xml. Also, the example I posted is generic, so there might be parameters which you may not require, so feel free to exclude them from your workflow – Chaos Apr 01 '14 at 22:24