I recently upgraded from v1.7.1.2 to v1.9.0 and after the upgrade I noticed that the CPU usage increased significantly. After doing some digging, I tracked it down to these two scheduler config options: min_file_process_interval (defaults to 0) and max_threads (defaults to 2).
As expected, increasing min_file_process_interval avoids the tight loop and drops cpu usage when it goes idle. But what I don't understand is why min_file_process_interval affects tasks execution?
If I set min_file_process_interval to 60s, it now waits no less than 60s between executing each task in my DAG, so if my dag has 4 sequential tasks it has now added 4 minutes to my execution time. For example:
start -> [task1] -> [task2] -> [task3] -> [task4]
^ ^ ^ ^
60s 60s 60s 60s
I have Airflow setup in my test env and prod env. This is less of an issue in my prod env (although still concerning), but a big issue for my test env. After the upgrade the CPU usage is significantly higher so either I accept higher CPU usage or try to decrease it with a higher config value. However, this adds significant time to my test dags execution time.
Why does min_file_process_interval affect time between tasks after the DAG has been scheduled? Are there other config options that could solve my issue?