I want to schedule my spark batch jobs from Nifi. I can see there is ExecuteSparkInteractive processor which submit spark jobs to Livy, but it executes the code provided in the property or from the content of the incoming flow file. How should I schedule my spark batch jobs from Nifi and also take different actions if the batch job fails or succeeds?
Asked
Active
Viewed 1,408 times
1 Answers
1
You could use ExecuteProcess to run a spark-submit command. But what you seem to be looking for, is not a DataFlow management tool, but a workflow manager. Two great examples for workflow managers are: Apache Oozie & Apache Airflow.
If you still want to use it to schedule spark jobs, you can use the GenerateFlowFile processor to be scheduled(on primary node so it won't be scheduled twice - unless you want to), and then connect it to the ExecuteProcess processor, and make it run the spark-submit
command.
For a little more complex workflow, I've written an article about :) Hope it will help.

Ben Yaakobi
- 1,620
- 8
- 22
-
Nice Article. I have already done that few months ago and my approach is similar to yours. I have written an custom processor to Join based on conditions. – Apurba Pandey Mar 26 '19 at 01:50