14

From the airflow documentation:

SubDAGs must have a schedule and be enabled. If the SubDAG’s schedule is set to None or @once, the SubDAG will succeed without having done anything

I understand the subdagoperator is actually implemented as a BackfillJob and thus we must provide a schedule_interval to the operator. However, is there a way to get the semantic equivalent of schedule_interval="@once" for a subdag? I'm worried that if I use set schedule_interval="@daily" for the subdag that the subdag may run more than once if the subdag takes longer than a day to run.

def subdag_factory(parent_dag_name, child_dag_name, args):
    subdag = DAG(
        dag_id="{parent_dag_name}.{child_dag_name}".format(
            parent_dag_name=parent_dag_name, child_dag_name=child_dag_name
        ),
        schedule_interval="@daily", # <--- this bit here
        default_args=args
    )

    ... do more stuff to the subdag here
    return subdag

TLDR: how to fake out "only run this subdag once per trigger of the parent dag"

gnicholas
  • 2,041
  • 1
  • 21
  • 32

2 Answers2

7

I find that schedule=@once works just fine for my subdags. Perhaps my version is outdated, but I've had more issues with my subdags failing even when all tasks succeeded (or were skipped) than the opposite.

Actual example code running quite happily live on my machine right now:

subdag_name = ".".join((parent_name,child_name))
logging.info(parent_name)
logging.info(subdag_name)
dag_subdag = DAG(
    dag_id=subdag_name,
    default_args=dargs,
    schedule_interval="@once",
)

In fact, I originally built almost all my dags as glorified cfg files for my subdags. Not sure how good an idea that is after some trial and error, but schedule interval was never a blocker for me.

I'm running a relatively recent build of 1.8 with few customizations. I've been following the example dag suggestion of keeping my subdags in a folder inside the dags folder so they don't show up in the DagBag.

apathyman
  • 1,031
  • 10
  • 11
  • I'm using airflow 1.7.1.3 and 1.8 is not an option ATM because that version accidentally broken custom executor plugins. I'll take a look at 1.8 to see if running subdags with a schedule of `"@once"` is possible, but I would be surprised if that were true as the documentation says it is not. – gnicholas Apr 22 '17 at 16:55
  • Any luck? My code is still happily running away. I tried to look up the canonical way to do this for you in 1.7. The closest thing I was able to find (assuming `@once` isn't viable) is set your `execution_timeout` for the actual subdag task is shorter than the execution frequency you've set in the subdag itself. That way you'll timeout before it is possible for your subdag to launch more tasks. I know this is speculation, but I wasn't easily able to find a build of airflow in our fork that is as old as the one you're on. – apathyman Apr 27 '17 at 17:09
  • 1
    Would love to hear from the authors why this works when the docs explicitly say it should not. – qwwqwwq Aug 15 '17 at 23:59
3

Try the external trigger pattern with schedule=None for the subdag. In that case it will be ran only when trigger by the parent dag

Priyank Mehta
  • 2,453
  • 2
  • 21
  • 32
  • 3
    For clarification, you are suggesting using the [TriggerDagRunOperator](https://airflow.incubator.apache.org/code.html?highlight=trigger%20dagrun#airflow.operators.TriggerDagRunOperator) in order to trigger a dag without a schedule? The key to the subdag is we want *blocking* semantics, the trigger dagrun operator just triggers a dagrun and then moves on and does not wait until the dagrun is done. Additionally, you don't get transparency in the airflow UI that a subdag was run, you just know that some random dagrun was triggered. – gnicholas Apr 21 '17 at 22:26