3

I am creating an airflow pipeline for pulling comment data from an API for a popular forum. For this I am creating two separate dags:

  1. one dag with schedule_interval set to every minute that checks for new posts, and insert these posts into a database
  2. another dag that I run manually to backfill my database with historic data. This dag simply looks for posts older than the oldest post in my database. For example if the oldest post in my db had id 1000, I would trigger the dag with argument 100 (number of historic posts I want) to fetch all posts in between 1000 and 900.

I have already created both dags, and right now I want to keep dag #2 manual so that I can trigger it whenever I want more historic data. The problem is that I do not want this to interfere with the schedule of dag #1. For this reason, I would like to be able to implement a system where, on calling dag #2, airflow first checks to see if the dag #1 is running, and IF SO, waits until dag #1 is finished to proceed. Likewise, I want to do this the other way around, where dag #1 will check if dag #2 is running before executing, and if so wait until dag #2 is finished. This is kind of confusing, but I want to build a dual-dependency between both dags, so that both cannot run at the same time, and respect each other by waiting until the other is finished before proceeding.

  • Check [ExternalTaskSensor](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/external_task_sensor.html) I think it may apply to what your use cases. – NicoE May 14 '21 at 13:42

0 Answers0