6

I am working in $AIRFLOW_HOME/dags. I have created the following files:

- common
  |- __init__.py   # empty
  |- common.py     # common code
- foo_v1.py        # dag instanciation

In common.py:

default_args = ...

def create_dag(project, version):
  dag_id = project + '_' + version
  dag = DAG(dag_id, default_args=default_args, schedule_interval='*/10 * * * *', catchup=False)
  print('creating DAG ' + dag_id)

  t1 = BashOperator(
    task_id='print_date',
    bash_command='date',
    dag=dag)

  t2 = BashOperator(
    task_id='sleep',
    bash_command='sleep 5',
    retries=3,
    dag=dag)

  t2.set_upstream(t1)

In foo_v1.py:

 from common.common import create_dag

 create_dag('foo', 'v1')

When testing the script with python, it looks OK:

 $ python foo_v1.py
 [2018-10-29 17:08:37,016] {__init__.py:57} INFO - Using executor SequentialExecutor
 creating DAG pgrandjean_pgrandjean_spark2.1.0_hadoop2.6.0

I then launch the webserver and the scheduler locally. My problem is that I don't see any DAG with id foo_v1. There is no pyc file being created. What is being done wrong? Why isn't the code in foo_v1.py being executed?

Joel
  • 1,564
  • 7
  • 12
  • 20
pgrandjean
  • 676
  • 1
  • 9
  • 19

3 Answers3

8

To be found by Airflow, the DAG object returned by create_dag() must be in the global namespace of the foo_v1.py module. One way to place a DAG in the global namespace is simply to assign it to a module level variable:

from common.common import create_dag

dag = create_dag('foo', 'v1')

Another way is to update the global namespace using globals():

globals()['foo_v1'] = create_dag('foo', 'v1')

The later may look like an overkill, but it is useful for creating multiple DAGs dynamically. For example, in a for-loop:

for i in range(10):
    globals()[f'foo_v{i}'] = create_dag('foo', f'v{i}')

Note: Any *.py file placed in $AIRFLOW_HOME/dags (even in sub-directories, such as common in your case) will be parsed by Airflow. If you do not want this you can use .airflowignore or packaged DAGs.

SergiyKolesnikov
  • 7,369
  • 2
  • 26
  • 47
1

You need to assign the dag to an exported variable in the module. If the dag isn't in the module __dict__ airflow's DagBag processor won't pick it up.

Check out the source here: https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L428

nimish
  • 4,755
  • 3
  • 24
  • 34
1

As it is mentioned in here, you must return the dag after creating it!

default_args = ...

def create_dag(project, version):
  dag_id = project + '_' + version
  dag = DAG(dag_id, default_args=default_args, schedule_interval='*/10 * * * *', catchup=False)
  print('creating DAG ' + dag_id)

  t1 = BashOperator(
    task_id='print_date',
    bash_command='date',
    dag=dag)

  t2 = BashOperator(
    task_id='sleep',
    bash_command='sleep 5',
    retries=3,
    dag=dag)

  t2.set_upstream(t1)

  return dag # Add this line to your code!
Mostafa Ghadimi
  • 5,883
  • 8
  • 64
  • 102