Questions tagged [airflow]

Apache Airflow is a workflow management platform to programmatically author, schedule, and monitor workflows as directed acyclic graphs (DAGs) of tasks.

Airflow is a workflow scheduler. It was developed by Airbnb to manage its complicated workflows.

References

Related Tags###

Similar workflow schedulers:

10104 questions
46
votes
3 answers

Airflow backfill clarification

I'm just getting started with Airbnb's airflow, and I'm still not clear on how/when backfilling is done. Specifically, there are 2 use-cases that confuse me: If I run airflow scheduler for a few minutes, stop it for a minute, then restart it…
diego_c
  • 819
  • 1
  • 7
  • 13
45
votes
8 answers

setting up s3 for logs in airflow

I am using docker-compose to set up a scalable airflow cluster. I based my approach off of this Dockerfile https://hub.docker.com/r/puckel/docker-airflow/ My problem is getting the logs set up to write/read from s3. When a dag has completed I get an…
JackStat
  • 1,593
  • 1
  • 11
  • 17
43
votes
12 answers

Can't import Airflow plugins

Following Airflow tutorial here. Problem: The webserver returns the following error Broken DAG: [/usr/local/airflow/dags/test_operator.py] cannot import name MyFirstOperator Notes: The directory structure looks like this: airflow_home ├──…
Christopher Carlson
  • 963
  • 2
  • 10
  • 20
42
votes
4 answers

How to limit Airflow to run only one instance of a DAG run at a time?

I want the tasks in the DAG to all finish before the 1st task of the next run gets executed. I have max_active_runs = 1, but this still happens. default_args = { 'depends_on_past': True, 'wait_for_downstream': True, 'max_active_runs':…
Tin Ng
  • 1,174
  • 1
  • 8
  • 14
42
votes
4 answers

How to add new DAGs to Airflow?

I have defined a DAG in a file called tutorial_2.py (actually a copy of the tutorial.py provided in the airflow tutorial, except with the dag_id changed to tutorial_2). When I look inside my default, unmodified airflow.cfg (located in ~/airflow), I…
Aleksey Bilogur
  • 3,686
  • 3
  • 30
  • 57
41
votes
1 answer

How >> operator defines task dependencies in Airflow?

I was going through Apache Airflow tutorial https://github.com/hgrif/airflow-tutorial and encountered this section for defining task dependencies. with DAG('airflow_tutorial_v01', default_args=default_args, schedule_interval='0 * * * *', …
idazuwaika
  • 2,749
  • 7
  • 38
  • 46
41
votes
3 answers

Importing local module (python script) in Airflow DAG

I'm trying to import a local module (a python script) to my DAG. Directory structure: airflow/ ├── dag │   ├── __init__.py │   └── my_DAG.py └── script └── subfolder ├── __init__.py └── local_module.py Sample code in…
hotchocolate
  • 435
  • 1
  • 4
  • 7
41
votes
8 answers

Store and access password using Apache airflow

We are using Airflow as a scheduler. I want to invoke a simple bash operator in a DAG. The bash script needs a password as an argument to do further processing. How can I store a password securely in Airflow (config/variables/connection) and access…
Anup
  • 927
  • 2
  • 14
  • 30
41
votes
4 answers

How to run Spark code in Airflow?

Hello people of the Earth! I'm using Airflow to schedule and run Spark tasks. All I found by this time is python DAGs that Airflow can manage. DAG example: spark_count_lines.py import logging from airflow import DAG from airflow.operators import…
40
votes
2 answers

How to pass parameter to PythonOperator in Airflow

I just started using Airflow, can anyone enlighten me how to pass a parameter into PythonOperator like below: t5_send_notification = PythonOperator( task_id='t5_send_notification', provide_context=True, python_callable=SendEmail, …
mdivk
  • 3,545
  • 8
  • 53
  • 91
40
votes
3 answers

TemplateNotFound error when running simple Airflow BashOperator

I'm trying to write our first Airflow DAG, and I'm getting the following error when I try to list the tasks using command airflow list_tasks orderwarehouse: Traceback (most recent call last): File…
quaintm
  • 677
  • 1
  • 7
  • 18
40
votes
15 answers

DAG not visible in Web-UI

I am new to Airflow. I am following a tutorial and written following code. from airflow import DAG from airflow.operators.python_operator import PythonOperator from datetime import datetime, timedelta from models.correctness_prediction import…
Rusty
  • 1,086
  • 2
  • 13
  • 27
39
votes
3 answers

Running airflow tasks/dags in parallel

I'm using airflow to orchestrate some python scripts. I have a "main" dag from which several subdags are run. My main dag is supposed to run according to the following overview: I've managed to get to this structure in my main dag by using the…
Mr. President
  • 1,489
  • 3
  • 11
  • 21
39
votes
1 answer

Airflow - run task regardless of upstream success/fail

I have a DAG which fans out to multiple independent units in parallel. This runs in AWS, so we have tasks which scale our AutoScalingGroup up to the maximum number of workers when the DAG starts, and to the minimum when the DAG completes. The…
J. Doe
  • 391
  • 1
  • 3
  • 3
39
votes
1 answer

Airbnb Airflow vs Apache Nifi

Are Airflow and Nifi perform the same job on workflows? What are the pro/con for each one? I need to read some json files, add more custom metadata to it and put it in a Kafka queue to be processed. I was able to do it in Nifi. I am still working on…
CMPE
  • 1,853
  • 4
  • 21
  • 37