Questions tagged [oozie]

Oozie is a workflow/coordination system to manage Hadoop Map Reduce jobs

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie is a scalable, reliable and extensible system.

References

Related Tags

1929 questions
6
votes
1 answer

Oozie coordinator with Asynchronous Data Set

We want to schedule a workflow based on data availability but there is no particular frequency of data arrival. Also there could be multiple re-runs of data and hence multiple versions of the data for the day arriving at any time. As I understand…
Vishal Joshi
  • 161
  • 1
  • 2
  • 6
6
votes
1 answer

Oozie Java Action : Passing Hbase classpath

I'm running a test hbase java program via oozie java action. The following error is encountered : Failing Oozie Launcher, Main class [HbaseTest], main() threw exception, org/apache/hadoop/hbase/HBaseConfiguration java.lang.NoClassDefFoundError:…
NZ.
  • 75
  • 1
  • 5
6
votes
2 answers

Oozie shell action memory limit

We have an oozie workflow with a shell action that needs more memory than what a map task is given by Yarn by default. How can we give it more memory? We have tried adding the following configuration to the action:
Thomas Larsson
  • 697
  • 1
  • 8
  • 17
6
votes
8 answers

Running shell script from oozie through Hue

I am invoking a bash shell script using oozie editor in Hue. I used the shell action in the workflow and tried below different options in shell command: Uploaded the shell script using 'choose a file' Gave local directory path where shell script is…
Sourabh Potnis
  • 1,431
  • 1
  • 17
  • 26
6
votes
2 answers

Executing Sqoops using Oozie

I have 2 Sqoops that loads data from HDFS to MySQL. I want to execute them using Oozie. I have seen that Oozie is an XML file. How can I configure it so I can execute those Sqoop? Demonstration with steps will be appreciated? Two Sqoops…
Rio
  • 765
  • 3
  • 17
  • 37
6
votes
2 answers

Oozie shell script action

I am exploring the capabilities of Oozie for managing Hadoop workflows. I am trying to set up a shell action which invokes some hive commands. My shell script hive.sh looks like: #!/bin/bash hive -f hivescript Where the hive script (which has been…
thedragonwarrior
  • 71
  • 1
  • 1
  • 7
6
votes
1 answer

Concurrency in running Oozie workflow: how many and how to throttle

Let us say we have a Oozie workflow that has a copy action node then a Shell action node. Can I start multiple instances of such a OOzie workflow and run them in parallel? How about the concurrency number could spike to thousands and/or even…
user908645
  • 317
  • 2
  • 3
  • 14
6
votes
1 answer

how can i provide password to SQOOP through OOZIE to connect to MS-SQL?

I'm exporting information from HDFS into MS-SQL using SQOOP. I'm running SQOOP through OOZIE. Right now I've hard-coded the uid, pwd for the jdbc connection in the OOZIE workflow. Once I switch to prod I won't be able to do this. What is the…
hba
  • 7,406
  • 10
  • 63
  • 105
6
votes
2 answers

Specifying multiple filter criteria through Oozie command line

I am trying to to search for some specific oozie jobs through command line. I am using the following syntax for the same $ oozie jobs -filter status=RUNNING ;status=KILLED However the command only returns jobs which are RUNNING and not the KILLED…
fuRy
  • 81
  • 1
  • 4
6
votes
3 answers

Getting E0902: Exception occured: [User: oozie is not allowed to impersonate oozie]

Hi i am new to Oozie and i am getting this error E0902: Exception occured: [User: pramod is not allowed to impersonate pramod] when i run the following command ./oozie job -oozie htt p://localhost:11000/oozie/ -config ~/Desktop/map-reduce …
Pramod
  • 493
  • 1
  • 8
  • 16
6
votes
4 answers

Oozie + Sqoop: JDBC Driver Jar Location

I have a 6 node cloudera based hadoop cluster and I'm trying to connect to an oracle database from a sqoop action in oozie. I have copied my ojdbc6.jar into the sqoop lib location (which for me happens to be at:…
nemo
  • 1,504
  • 3
  • 21
  • 41
6
votes
2 answers

Oozie workflow: Hive table not found but it does exist

I got a oozie workflow, running on a CDH4 cluster of 4 machines (one master-for-everything, three "dumb" workers). The hive metastore runs on the master using mysql (driver is present), the oozie server also runs on the master using mysql, too.…
Mario Mueller
  • 1,450
  • 2
  • 13
  • 16
6
votes
2 answers

How do I get more specific error info on killed job in Oozie

I have a hadoop map-reduce job running as a step in Oozie workflow. It is started using java action which implements org.apache.hadoop.util.Tool. When the job is being killed for some reason I want to be able to email a notification which should…
Art
  • 1,302
  • 13
  • 25
5
votes
2 answers

What is the correct way to use oozie to write to multiple output streams for a mapreduce job?

I'm using the new Hadoop API to write a sequence of map-reduce jobs. I plan to use Oozie to pipeline all of these together, but I can't seem to find a way to do multiple output streams from a map-reduce node in the workflow. Normally to write…
coltfred
  • 1,470
  • 9
  • 17
5
votes
2 answers

Oozie/yarn: resource changed on src filesystem

I have an Oozie workflow, with one of its step being a java step, running a jar stored on the local filesystem (the jar is present on all nodes). Initially, the jar was installed via a RPM, so they all have the same timestamp. While experimenting, I…
Guillaume
  • 2,325
  • 2
  • 22
  • 40