16

As you can see in the image : DAG latency between tasks] airflow is making too much time between tasks execution ? it almost represents 30% of the DAG execution time. I've changed the airflow.cfg file to:

job_heartbeat_sec = 1 
scheduler_heartbeat_sec = 1

but I still have the same latency rate.

Why does it behave this way ?

Dadep
  • 2,796
  • 5
  • 27
  • 40
I.Chorfi
  • 507
  • 2
  • 5
  • 12

2 Answers2

15

It is by design. For instance I use Airflow to perform large workflows where some tasks can take a really long time. Airflow is not meant for tasks that will take seconds to execute, it can be used for that of course but might not be the most suitable tool.

With that said there is not much that you can do since you already found out the key settings to configure.

Additionally you might want to try to increase the number of threads of the scheduler:

   [scheduler]
   max_threads = 4

This can alternatively be done by setting the environment variable:

AIRFLOW__SCHEDULER__MAX_THREADS=4

However do not count on the latency to decrease that much.

Hito
  • 820
  • 8
  • 10
  • Hi @Hito, **why** though does it have a long delay (~30 seconds for me). Is Airflow incapable of launching that fast? – Robert Lugg Jul 19 '19 at 23:39
  • Hi Robert, other than applying a lower heartbeat like in the OP I do not know of any way to speed it up.. But again, if you need to manually trigger a task that should be executed asap (within seconds), maybe Airflow is not the right tool for the job.. – Hito Jul 22 '19 at 10:48
6

Thirty seconds is fairly high for inter-task latency. In well-tuned environments I've seen, ~4-6 seconds between a task and a dependent task has been a fairly reasonable lower bound, even for environments with many thousands of DAGs.

As you've already stated, increasing the scheduler heartbeat (scheduler_heartbeat_sec) and the number of threads the scheduler has (scheduler.max_threads) are the best to decrease scheduling delays. If your tasks are blocked on other conditions (which you can check in logs; core.logging_level = DEBUG for even more information), then you should resolve those first.

If you've adjusted both the scheduler heartbeat and the number of worker threads and you still see high scheduling delays, then you may need to consider using a more powerful machine.

hexacyanide
  • 88,222
  • 31
  • 159
  • 162