I am creating an airflow pipeline for pulling comment data from an API for a popular forum. For this I am creating two separate dags:
- one dag with
schedule_interval
set to every minute that checks for new posts, and insert these posts into a database - another dag that I run manually to backfill my database with historic data. This dag simply looks for posts older than the oldest post in my database. For example if the oldest post in my db had
id
1000
, I would trigger the dag with argument100
(number of historic posts I want) to fetch all posts in between1000
and900
.
I have already created both dags, and right now I want to keep dag #2 manual so that I can trigger it whenever I want more historic data. The problem is that I do not want this to interfere with the schedule of dag #1. For this reason, I would like to be able to implement a system where, on calling dag #2, airflow first checks to see if the dag #1 is running, and IF SO, waits until dag #1 is finished to proceed. Likewise, I want to do this the other way around, where dag #1 will check if dag #2 is running before executing, and if so wait until dag #2 is finished. This is kind of confusing, but I want to build a dual-dependency between both dags, so that both cannot run at the same time, and respect each other by waiting until the other is finished before proceeding.