I recently started using Apache Airflow and after using conventional way of creating DAGs and tasks, decided to use Taskflow API. However, I ran into some issues, so here are my questions.
Conventional way:
with DAG(
dag_id="abc_test_dag",
start_date=days_ago(1),
) as dag:
start= PythonOperator(
task_id="start",
python_callable=lambda: print("Starting without returning anything")
)
end= PythonOperator(
task_id="end",
python_callable=lambda: print("Ended without accepting/returning anything")
)
start >> end
Using Taskflow:
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago
@dag(
start_date=days_ago(1),
catchup=False,
schedule_interval=None
)
def ab_test_dag():
@task()
def start(ds=None):
print(f"startin at {ds}")
return 1 #Question1 -Have to return smt
@task()
def end(in_parm): #Question2<-Have to have in param
print("ended")
end(start())
ab_dag = ab_test_dag()
Question1) It turns out that every but last task task HAVE TO return something some value, even if it will not be used further in the pipeline. Is this correct? Cause otherwise airflow will throw error saying airflow.exceptions.AirflowException: XComArg result from start at ab_test_dag with key="return_value" is not found! How to create a task without return value, say if I'm just creating a landing folder?
Question2) In order to build a chain of tasks, any non-first task (in our case task 'end') HAVE TO accept a parameter. Correct? otherwise getting DAG import error: too many positional arguments.
Long story short - how to make below pipeline work?
@task
def start(): print("Starting...")
@task
def end(): print("Ended")
with DAG(
dag_id="ab_test_dag",
start_date=days_ago(1),
) as dag:
# Conventional way of chaining, which works.
start_task=start()
end_task=end()
start_task >> end_task
# Taskflow method of chaining tasks, which **DOEN'T WORK**.
# end(start())