8

I have an airflow service that is currently running as separate docker containers for the webserver and scheduler, both backed by a postgres database. I have the dags synced between the two instances and the dags load appropriately when the services start. However, if I add a new dag to the dag folder (on both containers) while the service is running, the dag gets loaded into the dagbag but show up in the web gui with missing metadata. I can run "airflow initdb" after each update but that doesn't feel right. Is there a better way for the scheduler and webserver to sync up with the database?

Brad Miro
  • 251
  • 1
  • 2
  • 6
  • You can set your scheduler service to restart every few minutes, it should pick up new dags after getting restarted. Just use `airflow scheduler -r 300`, this means that the scheduler exits every 300 seconds, so if you set up your service to always restart the scheduler, every new dag should get loaded within < 5 mins. – Christopher Beck Aug 16 '18 at 22:36
  • I just tested this and new dags get picked up correctly after the scheduler service restarted, you only have to wait a few seconds and refresh the webUI/the workflow sometimes 2-3 times, but it works for me. – Christopher Beck Aug 16 '18 at 23:15
  • If I set the scheduler to restart every 300 seconds, do I miss any scheduled tasks that may have occurred during the time it was being restarted? As in, if I had a scheduled task for 10:00:00, I restart at 9:59:59 and it takes two seconds to restart, will it still know to execute this? – Brad Miro Aug 17 '18 at 14:51
  • That depends: If you have catchup=False, the scheduler will only run the most recent task/dag, so in the unrealistic case that one dag is supposed to start at 9:59:59 and one at 10:00:00, only the one with 10:00:00 start_time will run. To work around that you could set catchup=True. Otherwise if you only have your dag start lets say every hour, you should not see any problem. – Christopher Beck Aug 17 '18 at 20:49
  • You can easily test this by creating a dag that runs every 5 minutes, then stop the scheduler for 6 minutes after the dag ran the first time and then start it again. You should see that the dag that was scheduled to start while the scheduler wasn't running will get started. – Christopher Beck Aug 17 '18 at 20:51

2 Answers2

7

Dag updates should be picked up automatically. If they don't get picked up, it's often because the change you made "broke" the dag.

To check that new tasks are in fact picked up, on your webserver, run:

airflow list_tasks <dag_name> --tree

If it says Dag not found, then there's an error.
If it runs successfully, then it should show all your tasks and those tasks should be picked up in your airflow ui when you refresh it.

If the new/updated tasks are not showing up there, then check the dags folder on your webserver and verify that the code is indeed being updated.

Marco
  • 8,958
  • 1
  • 36
  • 56
Charlie Gelman
  • 436
  • 2
  • 5
2

In airlfow.cfg you can change the parameter to update

# after how much time a new DAGs should be picked up from the filesystem
min_file_process_interval = 0
dag_dir_list_interval = 60

by default it will update the code in the webserver every 300 seconds. you can change according to requirment.

kiran kumar
  • 110
  • 4