An Oozie workflow is sequence of actions, typically Hadoop MapReduce jobs, managed by the Oozie scheduler system.
Questions tagged [oozie-workflow]
190 questions
4
votes
1 answer
Workflow scheduling on GCP Dataproc cluster
I have some complex Oozie workflows to migrate from on-prem Hadoop to GCP Dataproc. Workflows consist of shell-scripts, Python scripts, Spark-Scala jobs, Sqoop jobs etc.
I have come across some potential solutions incorporating my workflow…

Balajee Venkatesh
- 1,041
- 2
- 18
- 39
4
votes
0 answers
Run Docker container through Oozie
I'm trying to build an Oozie workflow to execute everyday a python script which needs specific libraries to run.
At the moment I created a python virtual environment (using venv) on a node of my cluster (consisting of 11 nodes).
Through Oozie I saw…

AGL
- 116
- 1
- 7
4
votes
0 answers
YARN - log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException (Is a directory)
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /usr/local/hadoop/logs/userlogs/application_1502783189107_0007/container_1502783189107_0007_01_000001 (Is a directory)
`
Please provide me a suggestion when I am running…

overflow
- 79
- 7
4
votes
1 answer
oozie workflow spark launch job on a particular queue
I have an oozie configuration:
${jobTracker}
${nameNode}
…

Alessandro La Corte
- 419
- 3
- 6
- 18
3
votes
1 answer
Can Apache Oozie run docker containers?
Currently comparing DAG-based workflow tools like Airflow and Luigi for scheduling generic docker containers as well as Spark jobs.
Can Apache Oozie run generic Docker containers through its shell action? Or is Oozie strictly meant for Hadoop tools…

Kermit
- 4,922
- 4
- 42
- 74
3
votes
1 answer
oozie fs action against S3 not updating keys in MANIFESTS (DynamoDB metastore - emrfs not in sync) for S3 storage
Going by theory, on running hdfs commands using HDFS CLI,
hdfs dfs -touchz s3://bucketname/folder/file
it goes through EMRFS and it updates the key in MANIFESTS in dynamodb while creating S3 entry.
emrfs diff - says, both in S3 & MANIFESTS…

Viswa
- 1,357
- 3
- 18
- 30
3
votes
1 answer
Oozie workflow warning - "The application does not define formal parameters in its XML definition"
What is the meaning of the org.apache.oozie.util.ParameterVerifier warning "The application does not define formal parameters in its XML definition" ?

Jenny
- 35
- 1
- 6
2
votes
1 answer
How to use GCS bucket as workflow file source for Oozie in Dataproc
We're migrating our EMR cluster to Dataproc, and we're relying on Oozie to run our workflows. The first challenges is how to load the workflow.xml from Cloud Storage bucket. We used to do it using…

Bruno Moreira
- 175
- 3
- 15
2
votes
0 answers
Oozie Job log unable to view println statements
I have a few System.out.println() statements in my Java code that I am trying to view in the job scheduled using Oozie. I'm unable to find those println() statements despite the Job status showing as SUCCESSFUL.
My workflow.xml looks…

Sparker0i
- 1,787
- 4
- 35
- 60
2
votes
2 answers
OOZIE workflow.xml No function is mapped to the name coord:nominalTime
I'm using Oozie's SLA feature. I'm trying to use ${coord:nominalTime()} for nominal time, but it throws an error when I schedule the workflow:
E0803 : E0803: IO error, E1004: Expression language evaluation error, Validation error :No function is…

Aryan087
- 516
- 6
- 25
2
votes
1 answer
java.lang.IllegalArgumentException: Attempt to add ([custom-jar-with-spark-code].jar) multiple times to the distributed cache
I am trying to run a simple Java Spark job using Oozie on an EMR cluster. The job just takes files from an input path, does few basic actions on it and places the result in different output path.
When I try to run it from command line using…

pallupz
- 793
- 3
- 9
- 25
2
votes
0 answers
Sqoop Fail In Hue Workflow
When the following sqoop import is run in command shell works well.
import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera -m 1 --table categories --hive-database retail_stage --hive-table…

Yunus Einsteinium
- 1,102
- 4
- 21
- 55
2
votes
1 answer
retry-max value in oozie 4.2.0 version
I have oozie 4.2.0 HDP version, i want to use 'Max-retries' for my spark-action as well as shell action.
When i submit the workflow after ERROR state it goes to USER-RETRY state, and then again retries it.
When i look into oozie -info for that…

Mohit Rane
- 279
- 7
- 23
2
votes
3 answers
cant run shell in oozie ( error=2, No such file or directory )
I create workflow in ambari-views ui for oozie and sample.sh file in my workflow
after run that i have an error. when i change body of shell to simple command for example echo 1 this error did not appear
please advise me
2:34,752 WARN…

ahmad
- 73
- 1
- 6
2
votes
1 answer
oozie spark action table not found
I am trying to set up a spark action workflow within apache oozie though I'm getting the following error when select * from db.table is called within my spark code in a hive context:
org.apache.spark.sql.AnalysisException: Table not found:…

user6666914
- 31
- 1
- 6