Questions tagged [airflow]

Apache Airflow is a workflow management platform to programmatically author, schedule, and monitor workflows as directed acyclic graphs (DAGs) of tasks.

Airflow is a workflow scheduler. It was developed by Airbnb to manage its complicated workflows.

References

Related Tags###

Similar workflow schedulers:

10104 questions
3
votes
2 answers

Run Multiple Athena Queries in Airflow 2.0

I am trying to create a DAG in which one of the task does athena query using boto3. It worked for one query however I am facing issues when I try to run multiple athena queries. This problem can be broken as follows:- If one goes through this blog,…
Vineet
  • 723
  • 4
  • 12
  • 31
3
votes
1 answer

Passing macros value to sql file in airflow

I have a sql file, having a sql query :- delete from xyz where id in = 3 and time = '{{ execution_date.subtract(hours=2).strftime("%Y-%m-%d %H:%M:%S") }}'; Here I am writing macro in sql query itself, I want to pass it's value from python file where…
Arpit Pruthi
  • 177
  • 10
3
votes
5 answers

Get List of all the dags in python

I have a list of dags that are hosted on Airflow. I want to get the name of the dags in a AWS lambda function so that I can use the names and trigger the dag using experimental API. I am stuck on getting the names of the dag. Any help would be…
AKASH AGGARWAL
  • 31
  • 1
  • 1
  • 2
3
votes
0 answers

Handling Airflow DAG changes through time (DAG Versioning)

We have relatively complex dynamic DAG as part of our ETL. DAG contains hundreds of transformations and it is created programmatically based on set of yaml files. It is changed through time: new tasks are added, queries executed by tasks are changed…
partlov
  • 13,789
  • 6
  • 63
  • 82
3
votes
0 answers

Airflow: Why do DAG tasks run outdated DAG code?

I am running Airflow (1.10.9) through Cloud Composer (1.11.1) on GCP. Whenever I update a DAG's code I can see the updated code refreshed in the Airflow GUI but for at least 10 minutes the DAG's tasks still run the old code. A couple of…
AYR
  • 1,139
  • 3
  • 14
  • 24
3
votes
1 answer

How to dynamically generate airflow tasks in a loop and run them parallelly?

I have a use case in which, I am downloading some json files and parsing them. Depending on the files that are downloaded, the program needs to populate data in different tables. Once the data is loaded in the tables, an email notification must be…
Diablo3093
  • 963
  • 4
  • 15
  • 26
3
votes
1 answer

How to trigger Airflow DAG from AWS SQS?

I would like to trigger an Airflow DAD based on SQS messages. I am quite new to Airflow but this is how I think it should be done: Option 1 Use the Airflow SQS Sensor. From my understanding, this waits on SQS messages to proceed with the execution…
ypicard
  • 3,593
  • 3
  • 20
  • 34
3
votes
3 answers

How can we validated whether there are no cycles in the DAG objects

I am writing a unit test for my ETLs and as a process, I want to test all Dags to make sure that they do not have cycles. After reading Data Pipelines with Apache Airflow by Bas Harenslak and Julian de Ruiter I see they are using DAG.test_cycle(),…
3
votes
1 answer

How does Airflow integration with Kerberos work?

We are using Airflow 2.0.1 with following settings: celery executor and 4 workers on 4 nodes. most of our tasks run some Hadoop applications launched via BashOperator using impersonation using just default queue Firstly, we setup an own job that…
3
votes
2 answers

cant init db for airflow docker-compose permission denied

I am trying to run the docker-compose file from the main airflow website and when I try to do docker-compose up airflow-init it fails and gives me: airflow-init_1 | with open(AIRFLOW_CONFIG, 'w') as file: airflow-init_1 |…
Spencer Trinh
  • 743
  • 12
  • 31
3
votes
1 answer

Airflow UI not able to find the provider modules

I have installed Airflow on the server which is running Ubuntu and python 3.8. I'm trying to import a simple dag in Airflow UI to list the files in the bucket. from airflow import DAG from airflow.providers.amazon.aws.operators.s3_copy_object import…
3
votes
2 answers

Use kwargs and context simultaniauly on Airflow

Im using Airflow 1.10.11. Can I use a TriggerDagRunOperator to pass a parameter to the triggered dag? Airflow from a previous question I know that I can send parameter using a TriggerDagRunOperator. But my new question is: Can I use the parameter…
Israel Rodriguez
  • 425
  • 1
  • 6
  • 24
3
votes
1 answer

Airflow 2.0 Issues : Too many airflow supervisor tasks

I installed airflow 2.0 using docker swarm and Celery Executor. After 1 week, celery workers memory is overflowing by airflow task supervisor (screenshot attached) Anyone faced such issues ? Any suggestions ?
Ganesh
  • 677
  • 8
  • 11
3
votes
1 answer

Airflow: trigger Spark in different Docker container

I have both Airflow 2 (the official image) and Apache Spark running in a docker-compose pipeline. I would like to execute a DAG triggering a Spark script by means of the SparkSubmitOperator…
Requin
  • 467
  • 4
  • 16
3
votes
0 answers

Airflow: How to ensure that two that two dags are not running at the same time

I am creating an airflow pipeline for pulling comment data from an API for a popular forum. For this I am creating two separate dags: one dag with schedule_interval set to every minute that checks for new posts, and insert these posts into a…