0

I recently started using Apache Airflow and after using conventional way of creating DAGs and tasks, decided to use Taskflow API. However, I ran into some issues, so here are my questions.

Conventional way:

with DAG(
    dag_id="abc_test_dag",
    start_date=days_ago(1),
) as dag:

    start= PythonOperator(
        task_id="start",
        python_callable=lambda: print("Starting without returning anything")
    )

    end= PythonOperator(
        task_id="end",
        python_callable=lambda: print("Ended without accepting/returning anything")
    )

    start >> end

Using Taskflow:

from airflow.decorators import dag, task
from airflow.utils.dates import days_ago

@dag(
    start_date=days_ago(1),
    catchup=False,
    schedule_interval=None
) 
def ab_test_dag():

    @task()
    def start(ds=None):
        print(f"startin at {ds}")
        return 1      #Question1 -Have to return smt
    
    @task()
    def end(in_parm):  #Question2<-Have to have in param
        print("ended")
    
    
    end(start())

ab_dag = ab_test_dag()

Question1) It turns out that every but last task task HAVE TO return something some value, even if it will not be used further in the pipeline. Is this correct? Cause otherwise airflow will throw error saying airflow.exceptions.AirflowException: XComArg result from start at ab_test_dag with key="return_value" is not found! How to create a task without return value, say if I'm just creating a landing folder?

Question2) In order to build a chain of tasks, any non-first task (in our case task 'end') HAVE TO accept a parameter. Correct? otherwise getting DAG import error: too many positional arguments.

Long story short - how to make below pipeline work?

@task
def start(): print("Starting...")

@task
def end(): print("Ended")

with DAG(
    dag_id="ab_test_dag",
    start_date=days_ago(1),
) as dag:

    # Conventional way of chaining, which works.
    start_task=start()
    end_task=end()
    start_task >> end_task

    # Taskflow method of chaining tasks, which **DOEN'T WORK**. 
    # end(start())
Richard
  • 1
  • 1

1 Answers1

0

For question 1: I don't think that is the case with the returning values. Here I can show you an example from airflow:

"""Example DAG demonstrating the usage of the @taskgroup decorator."""

import pendulum

from airflow.decorators import task, task_group
from airflow.models.dag import DAG


# [START howto_task_group_decorator]
# Creating Tasks
@task
def task_start():
    """Empty Task which is First Task of Dag"""
    return '[Task_start]'


@task
def task_1(value: int) -> str:
    """Empty Task1"""
    return f'[ Task1 {value} ]'


@task
def task_2(value: str) -> str:
    """Empty Task2"""
    return f'[ Task2 {value} ]'

# A task without a return.
@task
def task_3(value: str) -> None:
    """Empty Task3"""
    print(f'[ Task3 {value} ]')


@task
def task_end() -> None:
    """Empty Task which is Last Task of Dag"""
    print('[ Task_End  ]')


# Creating TaskGroups
@task_group
def task_group_function(value: int) -> None:
    """TaskGroup for grouping related Tasks"""
    task_3(task_2(task_1(value)))


# Executing Tasks and TaskGroups
with DAG(
    dag_id="example_task_group_decorator",
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    tags=["example"],
) as dag:
    start_task = task_start()
    end_task = task_end()
    for i in range(5):
        current_task_group = task_group_function(i)
        start_task >> current_task_group >> end_task

# [END howto_task_group_decorator]

So a task does not necessarily return something always.

For question 2: When you write a pipeline with steps as tasks, you naturally want to chain them. So writing;

end(start())

or

end(second_step(first_step()))

is just a way to tell airflow scheduler what to do.

  • Regarding Q1: the example works without return because of conventional way of chaining task3 with end_task. That is to say, since task3 is the last task of the task_group_function, then current_task_group >> end_task essentially means ...task3 >> end_task. In this case task3 doens't have to return value because of conventional way (>>) of chaining tasks. To illustrate: – Richard Jul 20 '22 at 15:54