4

I have an Airflow DAG where I need to get the parameters the DAG was triggered with from the Airflow context.

Previously, I had the code to get those parameters within a DAG step (I'm using the Taskflow API from Airflow 2) -- similar to this:

from typing import Dict, Any, List
from airflow.decorators import dag, task
from airflow.operators.python import get_current_context
from airflow.utils.dates import days_ago

default_args = {"owner": "airflow"}

@dag(
    default_args=default_args,
    start_date=days_ago(1),
    schedule_interval=None,
    tags=["my_pipeline"],
)
def my_pipeline():
    @task(multiple_outputs=True)
    def get_params() -> Dict[str, Any]:
        context = get_current_context()
        params = context["params"]
        assert isinstance(params, dict)
        return params

    params = get_params()

pipeline = my_pipeline()

This worked as expected.

However, I needed to get these parameters in several steps, so I thought it would be a good idea to move code to get them into a separate function in the global scope, like this:

# ...
from airflow.operators.python import get_current_context

# other top-level code here

def get_params() -> Dict[str, Any]:
    context = get_current_context()
    params = context["params"]
    return params

@dag(...)
def my_pipeline():
    @task()
    def get_data():
        params = get_params()

    # other DAG tasks here
    get_data()

pipeline = my_pipeline()

Now, this breaks right on DAG import, with the following error (names changed to match the examples above):

Broken DAG: [/home/airflow/gcs/dags/my_pipeline.py] Traceback (most recent call last):
  File "/home/airflow/gcs/dags/my_pipeline.py", line 26, in get_params
    context = get_context()
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/operators/python.py", line 467, in get_context
    raise AirflowException(
airflow.exceptions.AirflowException: Current context was requested but no context was found! Are you running within an airflow task?

And I get what the error is saying and how to fix it (move the code to get context back inside a @task). But my question is -- why does the error come up right on DAG import?

get_params doesn't get called anywhere outside of other tasks, and those tasks are obviously not run until the DAG runs. So why does the code in get_params run at all right when the DAG gets imported?

At this point, I want to understand this just because the fact that this error comes up when it comes up is breaking my understanding of how Python modules are evaluated on import. Code within a function shouldn't run until the function is run, and the only error that can come up before it's run is SyntaxError (and maybe some other core errors that I'm not remembering right now).

Is Airflow doing some special magic, or is there something simpler going on that I'm missing?

I am running Airflow 2.1.2 managed by Google Cloud Composer 1.17.2.

anna_hope
  • 335
  • 3
  • 9

1 Answers1

4

Unfortunately I am not able to reproduce your issue. The similar code below parses, renders a DAG, and completes successfully on Airflow 2.0, 2,1, and 2.2:

from datetime import datetime
from typing import Any, Dict

from airflow.decorators import dag, task
from airflow.operators.python import get_current_context


def get_params() -> Dict[str, Any]:
    context = get_current_context()
    params = context["params"]
    return params


@dag(
    dag_id="get_current_context_test",
    start_date=datetime(2021, 1, 1),
    schedule_interval=None,
    params={"my_param": "param_value"},
)
def my_pipeline():
    @task()
    def get_data():
        params = get_params()
        print(params)

    get_data()


pipeline = my_pipeline()

Task log snippet: Task log

However, context objects are directly accessible in task-decorated functions. You can update the task signature(s) to include an arg for params=None (default value used so the file parses without a TypeError exception) and then apply whatever logic you need with that arg. This can be done with ti, dag_run, etc. too. Perhaps this helps?

@dag(
    dag_id="get_current_context_test",
    start_date=datetime(2021, 1, 1),
    schedule_interval=None,
    params={"my_param": "param_value"},
)
def my_pipeline():
    @task()
    def get_data(params=None):
        print(params)

    get_data()


pipeline = my_pipeline()
Josh Fell
  • 2,959
  • 1
  • 4
  • 15
  • 2
    Thanks! It's possible I missed a `@task` decorator in one of the steps, that was then called in the main `@dag` function. But in that case, the Airflow error is misleading, because it doesn't include the whole traceback. And thank you for pointing out that context objects are accessible directly! – anna_hope Oct 22 '21 at 02:48