0

Note: All examples below seek a one-line serial execution.

Dependencies on Taskflow API can be easily made serial if they don't return data that is used afterwards:

t1() >> t2()

If the task T1 returning a value and task T2 using it, you can also link them like this:

return_value = t1()
t2(return_value)

However, if you have to mix and match returning statements, this is no longer clear:

t1() >> 
returned_value = t2()
t3(returned_value)

will fail due to syntax error (>> operator cannot be used before a returning-value task).

Also, this would work, but not generate the serial (t1 >> t2 >> t3) dag required:

t1()
returned_value = t2()
t3(returned_value)

since then t1 and t2/t3 will run in parallel.

A way to make this is to force t2 to use a returned value from t1, even if not needed:

returned_fake_t1 = t1()
returned_value_t2 = t2(returned_fake_t1)
t3(returned_value_t2)

But this is not elegant since it needs to change the logic. What is the idiomatic way in Taskflow API to express this? I was not able to find such a scenario on Airflow Documentation.

xmar
  • 1,729
  • 20
  • 48

1 Answers1

1

I've struggled with the same problem, in the end it was solved it as:

t1_res = t1()
t2_res = t2()
t3_res = t3(t2_res)
t1_res >> t2_res >> t3_res

Good thing is that you don't really need to return anything from t1() or to add a fake parameter to f2(). Kudos to this article and it's author: https://khashtamov.com/ru/airflow-taskflow-api/

  • Thanks Mikhail. This is better so I accept the answer. However, it is surprising that there's not a better way, since it still doesn't look intuitive and clean to me. – xmar May 02 '23 at 09:30