My goal is to schedule jobs with EmrCreateJobFlowOperator
and EmrAddStepsOperator
. Namely, I want to create cluster and add add steps for each scheduled day (or hour) starting from specified date. Basically, I want EmrAddStepsOperator
to be back-filled, but not EmrCreateJobFlowOperator
. To achieve this I though that I can use sub-dag concept, where parent dag will have disabled catch-up and child dag will have it enabled. I don't want to create EMR cluster for each step.
Is this possible? Are there any other options?
Asked
Active
Viewed 145 times
1

gorros
- 1,411
- 1
- 18
- 29
-
1Catchup=False is set at the Dag level. You can try using the same logic as a `ShortCircuitOperator` or a `PythonBranchOperator` for the `ErmCreateJobFlowOperator` task where it will only run if the EMR cluster does not exist – pedram Feb 04 '19 at 20:08
-
I am not sure if this solves the problem. I need to add steps to one cluster for the first run and later just a single step for each day. But it seems it can't be done by only using Airflow concept, some additional development. – gorros Feb 05 '19 at 12:16