0

It seems like that apache oozie is not currently support Spark jobs, am I right? any way to integrate spark jobs into oozie?

HHH
  • 6,085
  • 20
  • 92
  • 164
  • 1
    possible duplicate of [launching a spark program using oozie workflow](http://stackoverflow.com/questions/29233487/launching-a-spark-program-using-oozie-workflow) – Joe Kennedy Jul 27 '15 at 19:13

2 Answers2

1

You can always execute spark as a Java action . Or you can also use spark action in oozie, Refer to this link which has details about spark action -- https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd

<java>
        <main-class>org.apache.spark.deploy.SparkSubmit</main-class>

        <arg>--class</arg>
        <arg>${spark_main_class}</arg> 

        <arg>--deploy-mode</arg>
        <arg>cluster</arg>

        <arg>--master</arg>
        <arg>yarn</arg>

        <arg>--queue</arg>
        <arg>${queue_name}</arg> -> depends on your oozie config

        <arg>--num-executors</arg>
        <arg>${spark_num_executors}</arg>

        <arg>--executor-cores</arg>
        <arg>${spark_executor_cores}</arg>

        <arg>${spark_app_file}</arg> 

        <arg>${input}</arg> -> some arg 
        <arg>${output}</arg>-> some other arg

        <file>${spark_app_file}</file>

        <file>${name_node}/user/spark/share/lib/spark-assembly.jar</file>
    </java>
Karthik
  • 1,801
  • 1
  • 13
  • 21
1

Oozie support for Spark is coming, see the Jira, this is currently only in trunk.

Otherwise the options are running it as Java or a Shell action.

dpeacock
  • 2,697
  • 13
  • 16