1

My goal is to schedule jobs with EmrCreateJobFlowOperator and EmrAddStepsOperator. Namely, I want to create cluster and add add steps for each scheduled day (or hour) starting from specified date. Basically, I want EmrAddStepsOperator to be back-filled, but not EmrCreateJobFlowOperator. To achieve this I though that I can use sub-dag concept, where parent dag will have disabled catch-up and child dag will have it enabled. I don't want to create EMR cluster for each step. Is this possible? Are there any other options?

gorros
  • 1,411
  • 1
  • 18
  • 29
  • 1
    Catchup=False is set at the Dag level. You can try using the same logic as a `ShortCircuitOperator` or a `PythonBranchOperator` for the `ErmCreateJobFlowOperator` task where it will only run if the EMR cluster does not exist – pedram Feb 04 '19 at 20:08
  • I am not sure if this solves the problem. I need to add steps to one cluster for the first run and later just a single step for each day. But it seems it can't be done by only using Airflow concept, some additional development. – gorros Feb 05 '19 at 12:16

0 Answers0